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Abstract 


Keeping the local times of processes in a distributed system synchronized in the presence of 
arbitrary faults is important in many applications and is an interesting theoretical problem in its 
own right. In order to be practical, any algorithm to synchronize clocks must be able to deal with 
process failures and repairs, clock drift, and varying meseage delivery times, but these conditions 
complicate the design and analysis of algorithms. in this thesis, a genera! formal model to 
describe a system of distributed processes, each of which has its own clock, is presented. The 
processes communicate by sending messages to each other, and they can set timers to cause 
themselves to take steps at some future times. it is proved that even if the clocks run at a perfect 
rate and there are no failures, an uncertainty of e in the known message delivery time makes it 
impossible to synchronize the clocks of n processes any more closely than 2e(1 - 1/n).. A simple 
algorithm that achieves this bound is given to show that the lower bound Is tight. 


Two fault-tolerant algorithms are presented and analyzed, one to maintain synchronization 
among processes whose clocks initially are close together, and another to establish 
synchronization in the first place. Both handle drift in the clock rates, uncertainty in the message 
delivery time, and arbitrary failure of just under one third of the processes. The maintenance 
algorithm can be modified to allow a failed process that has been repaired to be reintegrated into 
the system. A variant of the maintenance algorithm is used to establish the initial synchronization. 
It was also necessary to design an interface between the two algorithms since we envision the 
processes running the start-up algorithm until the desired degree of synchronization is obtained, 
and then switching to the maintenance algorithm. 
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Chapter One 


Introduction 


1.1 The Problem 


Keeping the local times of processes in a distributed system synchronized in the presence of 
arbitrary faults is important in many applications and is an interesting problem in its own right. In 
order to be practical, any algorithm to synchronize clocks must be able to deal with process 
failures and repairs, clock drift, and varying message delivery times, but these conditions 
complicate the design and analysis of algorithms. 


In this thesis we describe a formal model for a system of distributed processes with clocks, and 
demonstrate a lower bound on how closely the clocks can be synchronized, even when strong 
assumptions are made about the behavior of the system. Then we describe and analyze 
algorithms to establish and maintain synchronization under more realistic assumptions. 


We assume a collection of processes that communicate by sending messages over a reliable 
medium. Each process has a physical clock, not under its control, that is incremented in some 
relationship with real time. By adding the value of a local variable to the value of the physical 
clock, the process obtains its local time. 


The design of a clock synchronization algorithm must take into account the following factors. 


1. The uncertainty in the message delivery time. Messages are assumed in this thesis to 
be delivered a fixed amount of time after they are sent, plus or minus some — 
uncertainty. 


2. Clock drift. Are the processes’ clock rates fast or slow relative to real time? If the 
clocks drift, then the synchronization procedure must be repeated periodically to 
keep the clocks synchronized. 


3. Are the clocks initially synchronized? If they are, then the problem of synchronizing 
the clocks is already solved unless the clocks drift, since once nondrifting clocks are 
synchronized, they stay synchronized. 


4. Fault tolerance. What kinds of faults (if any) are tolerated? This thesis does not 
consider communication link failures. A certain proportion of the processes, 
however, may be faulty in the worst possible way, by sending arbitrary messages at 
arbitrary times. 
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5. Digital signatures. Can a faulty process forge a message from another process? If 
digital signatures are available, then process p can tell process q that it received a 
message x from process r, only if such was actually the case. This obviously reduces 

the power of a faulty process to create havoc. Some of the other clock 
synchronization algorithms in the literature (5, 7] need this capability, but ours do not. 


6. Reintegration. In order to be practical, a synchronization algorithm must allow faulty 
processes that have recovered to be reintegrated into the system. 


7. Size of the adjustment. Particularly when the synchronization procedure is 
performed periodically, the amount by which the clock is changed should not be too 
big. 


1.2 Results of the Thesis 


1.2.1 Model 
One of the contributions of this thesis is a precise formal model of a system of distributed — 
processes, each of which has its own clock. Within the model, lower bound proofs can be seen to 
be rigorous, and the effects of algorithms, once they are stated in a language that maps to the 
model, can be discerned unambiguously. The model is described in Chapter 2. 


We model the situation in which each process has a physical clock that is not under its control. 
By adding some value to the physical clock time a process obtains a local time. A process can set 
a timer to go off at a specified time in the future. Formally, timers are treated similarly to 
messages between processes. The system is interrupt-driven in that a process only takes a step 
when a message arrives. The message may come from another process, or it may be a timer that 
was set by the process itself. Thus, by using a timer, a process can ensure that an interrupt will 
occur at a specified time in the future. 


A process is modelled as an automaton, with states and a transition function. One of the 
arguments to the transition function is a real number, representing the time on the process’ clock. 
Clocks are modelled as real-valued functions from real time to clock time. We assume that the 
communication network is fully connected, so that every process can send a message directly to 
every other process. Processes possess the capability of broadcasting a message to ail the 
processes at the same time. The message system is described as a buffer that holds messages 
until they are delivered. All messages are delivered within a fixed amount of time plus or minus 
some uncertainty. The delivery of a message at a process is the only type of event we consider. A 
system execution consists of sequences of "actions", each of which is a process event 


surrounded by a description of the state of the system, one sequence for each real time of 
interest. The sequences must satisfy certain natural consistency and correctness conditions. 


1.2.2 Lower Bound 

Even if the simplifying assumptions are made that clocks run at a perfect rate and that there are 
no failures, the presence of an uncertainty of « in the message delivery time alone prevents any 
algorithm from exactly synchronizing clocks that initially have arbitrary values. We show in 
Chapter 3 that 2e(1 - 1/n) is a lower bound on how closely the clocks of n processes can be 
synchronized in this case. Of course, in this case, any algorithm which synchronizes the clocks 
once causes them to remain synchronized. However, since these are strong assumptions, this 
lower bound also holds for the more realistic case in which clocks do drift and arbitrary faults 
occur. Just to show that this bound is tight, we describe an algorithm that achieves this bound for 
the simplified case. 


1.2.3-Maintaining Synchronization 

We describe a synchronization algorithm in Chapter 4 that handies clock drift, uncertainty in the 
message delivery time and arbitrary process faults. The algorithm requires the clocks to be 
initially close together and less than one third of the processes to be faulty. 


Our algorithm runs in rounds, resynchronizing every so often to correct for the clocks drifting out 
of synchrony, and using a fault-tolerant averaging function based on those in [1] to calculate an 
adjustment. The size of the adjustment made to a clock at each round is independent of the 
number of faulty processes. At each round, n? messages are required, where n is the total 
number of processes. The closeness of synchronization achieved depends only on the initial 
closeness of synchronization, the message delivery time and its uncertainty, and the drift rate. 
Since the closeness of synchronization depends on the initial closeness, this is, in the terminology 
of [7], an interactive convergence algorithm. We give explicit bounds on how the difference 
between the clock values and real time grows. The algorithm can be easily adapted to become a 
reintegration procedure for repaired processes. 


At the beginning of each round, every nonfaulty process broadcasts its clock value and then waits 
a bounded amount of time, measured on its logical clock, long enough to ensure that clock values 
are received from all nonfaulty processes. After waiting, the process averages the arrival times of 
all the messages received, using a particular fault-tolerant averaging function. The resulting 


average is used to calculate an adjustment to the process’ clock. 
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The fault-tolerant averaging function is derived from those used in [1] for reaching approximate 
agreement. The function is designed to be immune to some fixed maximum number, f, of faults. ‘It 
first throws out the f highest and f lowest values, and then applies some ordinary averaging 
function to the remaining values. We choose the midpoint of the range of the remaining values, to 
be specific. The properties of the fault-tolerant averaging function allow the distance between the 
clocks to be halved, in a rough sense, at each round. Consequently, the averaging function can 
be considered the heart of the algorithm. 


This algorithm can maintain a closeness of synchronization of approximately 4e, where e is the 


uncertainty in the message delivery time. 


1.2.4 Establishing Synchronization 

The problem solved by the algorithm in Chapter 4 is only that of maintaining synchronization of 
local times once it has been established. There is, of course, the separate problem of establishing 
such synchronization in the first place among processes whose clocks have arbitrary values. A 
variant of the maintenance algorithm can be used to establish the initial synchronization as well 
and is described in Chapter 5. The algorithm handles arbitrary failures of the processes, 
uncertainty in the message delivery time, and clock drift. It was also necessary to design an 
interface between the two algorithms since we envision the processes running this algorithm until 
the desired degree of synchronization is obtained, and then switching to the maintenance 
algorithm. 


The structure of the algorithm is similar.to that of the algorithm which maintains synchronization. 
It runs in rounds. During each round, the processes exchange clock values and use the same 
fault-tolerant averaging function as before to calculate the corrections to their clocks. However, 
each round contains an additional phase, in which the processes exchange messages to decide 
that they are ready to begin the next round. 


This algorithm also synchronizes the clocks to within about 4e. Again, the fault-tolerant averaging 
function used in the algorithm causes the difference in the clocks to be cut in half at each round. 
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1.3 Related Work 


The problem of synchronizing clocks has been a topic of interest recently. A seminal paper was 
Lamport’s work [6], defining logical clocks and describing an algorithm to synchronize them. 
Several algorithms to synchronize real time clocks have appeared in the literature [5, 6, 7, 9). 
Those of Lamport [6] and Marzullo [9] have the processes updating their clocks whenever they 
receive an appropriate message; these messages are assumed to arrive every sO many real 
seconds, or more often. In contrast, the algorithms in Halpern, Simons and Strong [5], Lamport 
and Melliar-Smith [7], and this thesis run in rounds. During a round, a process updates its clock 
once. The rounds are determined by the times at which different processes’ local clocks reach 
the same times. There is an impossibility result due to Dolev, Halpern and Strong [2], showing 
that it is impossible to synchronize clocks without digital signatures if one third or more of the 
processes are subject to Byzantine failures. Dolev, Halpern and Strong's paper [2] also contains 
a lower bound similar to ours (proved independently), but characterizing the closeness of 
synchronization obtainable along the real time axis, that is, a lower bound on how closely iin real 
time two processes’ clocks can read the same value. 


The three algorithms of Lamport and Melliar-Smith [7], as well as our maintenance algorithm, 
require a reliable, completely connected communication network, and handle arbitrary process 
faults. The first algorithm works by having each process at every round read all the other 
processes’ clocks and set its clock to the average of those values that aren’t too different from its 
own. The size of the adjustment is no more than the amount by which the clocks differ plus the 
uncertainty in obtaining the other processes’ clock values. However, the closeness of the 
synchronization achieved depends on the total number of processes, n. The message complexity 
is n* at each round, if getting another process’ clock value is equated with sending a message. 


In the other two algorithms in [7], each process sets its clock to the median of the values obtained 
by receiving messages from the other processes. To make sure each nonfaulty process has the 
same set of values, the processes execute a Byzantine Agreement protocol! on the values. The 
two algorithms use different Byzantine Agreement protocols. One of the protocols doesn't 
require digital signatures, whereas the other one does. As a result, the clock synchronization 
algorithm derived from the latter will work even if almost one half of the processes are faulty, while 
the other two algorithms in [7] can only handle less than one third faulty processes. For both of 
the Byzantine clock synchronization algorithms, the closeness of synchronization and the size of 
the adjustment depend on the number of faulty processes, and the number of messages per 
round is exponential in the number of faults. 
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The algorithm of Halpern, Simons and Strong [5] works in the presence of any number of process 

and link failures as jong as the nonfaulty processes can still communicate. It requires digital 

signatures. When a process’ clock reaches a certain value (decided on in advance), it broadcasts. 
- that time. If it receives a message containing the value not too long before it reaches the value, it 

updates its clock to the value and relays the message. The closeness of synchronization depends 

only on the drift rate, the round length, the message delivery time, and the diameter of the 

communication graph after the faulty elements are removed. The message complexity per round 

isn*. However, the size of the adjustment depends on the number of faulty processes. 


The framework and error model used by Marzullo in [9] make a direct comparison of his results 
with ours difficult. He considers intervals of time and analyzes the error probabilistically. 


The problem addressed in these papers is only that of maintaining synchronization of local times 
once it has been established. None of them explicitly discusses any sort of validity condition, 
quantifying how clock time increases in relation to real time. Only [5] includes a reintegration 
procedure for repaired processes. 
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Chapter Two 


Formal Model 


2.1 Introduction 


We present a formal model for describing a system of distributed processes, each of which has its 
own clock. The processes communicate by sending messages to each other, and they can set 
timers to cause themselves to take steps at some specified future times. The model is designed to 
handle arbitrary clock rates, Byzantine process failures, and a variety of assumptions about the 
behavior of the message system. . 


The advantages of a formal model are that lower bound proofs can be seen to be rigorous, and 
the effects of an algorithm, once it is stated in a language that maps to the model, can be 
discerned unambiguously. 


This model will be used in subsequent chapters to describe our particular versions of the clock 


synchronization problem. 


2.2 Informal Description 


We mode! a distributed system consisting of a set of processes that communicate by sending 
messages to each other. Each process has a physical clock that is not under its control. 


A typical message consists of text and the sending process’ name. There are also two special 
messages, START, which comes from an external source and indicates that the recipient should 
begin the algorithm, and TIMER, which a process receives when its physical clock has reached a 
designated time. 


A process is modelled as an automaton with a set of states and a transition function. The 
transition function describes the new state the process enters, the messages it sends out, and the 
timers it-sets for itself, all as a function of the process’ current state, received message and 
physical clock time. An application of the transition function. constitutes a process step, the only 
kind of event in our model. 
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The system is interrupt-driven in that a process only takes a step when a message arrives. The 
message may come from another process, or it may be a TIMER message that was sent by the 
process itself. Thus, by using a TIMER message, a process can ensure that an interrupt will occur 
at a specified time in the future. We neglect local processing time by assuming that the 
processing of an arriving message is instantaneous. 


We assume that the communication network is fully connected, so that every process can send a 
message directly to every other process. Processes possess the capability of broadcasting a 
message to all the processes at one step. The message system is described as a buffer that holds 
messages until they are delivered. 


System histories consist of sequences of "actions", each of which is a process event surrounded 
by a description of the state of the system, one sequence for each real time of interest. The 
sequences must satisfy certain natural consistency and correctness conditions. We introduce the 
notion of "shifting" the real times at which a particular process’ steps occur in a history and note 
the resulting changes to the message delivery times. Finally, we define an execution to be a 
history in which the message system behaves as desired. 


2.3 Systems of Processes 


Let P be a fixed set of process names. Let X be a fixed set of message vaiues. Then M, the set of 
messages, is {START, TIMER} U (X x P). A process receives a START message as an external 
indication of the beginning of an algorithm. A process receives a TIMER message when a 
specified time has been reached on its physical clock. Ail other messages consist of a message 
value and a process name, indicating the sender of the message. 


Let F(S) denote the finite subsets of the set S. 


A process p is modelled as an automaton. It has a set Q of states, with a distinguished subset | of 
initial states, and a distinguished subset F of final states. It has.a transition function, r, where 7: Q 
xRxM—+ Qx F(X x P) x F(R). The transition function maps p's state, a real number indicating its 
physical clock time, and an incoming message, all to a new state for p, a finite set of (message 
_ value, destination) pairs, and a finite set of times at which to set timers. For any rin R, min M, Yin 
$(X x P), and Z in F(R), if q is in F and if r(q,r,m) = (q’,Y,Z), we require that q’ also be in F. That is, 
once a process is in a final state, it can never change to non-final state. 
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We assume that, in the absence of non-TIMER messages, a process does not set an infinite 
sequence of timers for itself within a finite amount of time. To state this condition formally, we 
choose any time r, and state q, for p, and consider the following sequence of applications of tT) 


To(QyF,TIMER) = (4,Y_.Z,) 


7 )(Qo,f,TIMER) = (Qq,:¥4:2,), where r, = min{r €2,: r>r,} 


7,(4,,", TIMER) = (0,449.24 4) where r, = min{r € Ui 2254} rr 4} 


i+? i+? i 


Then as ij approaches ©9, it must be that r, approaches ©0. 
We define a step of p to be a tuple (q,r,m,q’,Y,Z) such that 7(q,r,m) = (q’,Y,Z). 


A clock is a monotonically increasing, everywhere differentiable function from R (real time) to R 
(clock time). We will employ the convention that clock names are capitalized and that the inverse 
of a clock has the same name but is not capitalized. Also, real times are denoted by smail letters 
and clock times by capital letters. | 


A system of processes, denoted (P,N,S), consists of a set of processes, one for each name in P, a 
nonempty subset N of P called the nonfaulty processes, and a nonempty subset S of P called the 
self-starting processes. (We will use P to denote both the set of names and the set of processes, 
relying on context to distinguish the two.) The nonfaulty processes represent those processes 
that are required to follow the algorithm. The self-starting processes are intended to model those 
that will begin executing the algorithm on their own, without first receiving a message. A system 
of processes with clocks, denoted (P,N,S,PH), is a system of processes (P,N,S) together with a set 
of clocks PH = {Ph pbs one for each p in P. Clock Ph, is called p's physical clock. The transition 
function for p is denoted by 7) Throughout this thesis we assume [P| = n. 
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2.4 Message System 


We assume that every process can communicate directly with every process, (including itself, for 
uniformity) at each step. The message system is modelled by a message buffer, which stores 
each message, together with the real times at which it is sent and delivered. For technical 
convenience, we do not require that messages be sent before being received. This correctness 
condition is imposed later. 


A state of the message buffer consists of a multiset of tuples, each of the form (p,x,q) or 
(TIMER,T,p) or (START,p), with associated real times of sending and delivery. The message (x,p) 
with recipient q is represented by (p,x,q). (TIMER,T,p) indicates a timer set for time T on p’s 
physical clock. (START,p) represents a START message with p as the recipient. 


An initial state of the message buffer is a state consisting of some set of START messages. The 
sending and delivery times are all initialized as 00. 


The behavior of the message buffer is captured as a set of sequences of SEND and RECEIVE 
operations, each operation with its associated real time. Each operation involves a message 
tuple. The result of performing each operation is described below. 


SEND(u,t): the tuple u is placed in the message buffer with sending time t and delivery time 9 as 
long as there is no u entry already in the message buffer with sending time ©. If there is, then tis 
made the new sending time of the u entry with the earliest delivery time and sending time ©. 


RECEIVE(u,t): the tuple u is placed in the message buffer with delivery time t and sending time 
00, as long as there is no u entry already in the message buffer with delivery time °°. If there is, 
then tis made the new delivery time of the u entry with the earliest sending time and delivery time 
oo, 


The message delay of a non-START message is the delivery time minus the sending time. A 
positive message delay means the message was sent before it was delivered. A negative message 
delay means the message was delivered before it was sent. A message delay of + 0 means the 
message was sent but never delivered, and a message delay of -CO means the message was 
delivered, but never sent. (The message delay is not defined for START messages that are never 
delivered.) 
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2.5 Histories 


In this section we define a history, a construct that models a computation in which nonfaulty 
processes follow their state-transition functions. Constraints to ensure that the message system 
behaves correctly will be added in Section 2.8. 


Fix a system of processes and clocks f = (P,N,S,PH). 


An event for P is of the form receive(m,p), the receipt of message m by process p, where p is in 
P. A schedule for P is a mapping from R (real times) to finite sequences of events for P such that 
only a finite number of events occur before any finite time, and for each real time t and process p, 
all TIMER events for p are ordered after all non- TIMER events for p. The first condition rules out a 
process taking an infinite number of steps in a finite amount of time, and the second condition 
allows messages that arrive at the same time as a timer goes off to get in "just under the wire”. 


In order to discuss how an event affects the system as a whole, we define a configuration for P to 
consist of a state for each process in P and a state for the message buffer. An initial contiguration 
for (P,N,S) consists of an initial state for each process and an initial state for the message buffer. 


An action for P is a triple (F,e,F’), consisting of an event for P and two configurations F and F’ for 
P. F is the preceding and F’ the succeeding configuration for the action. 


A history for Sis a mapping from real times to sequences of actions for (P,N,S) with the following 
properties: 


e the projection onto the events is a schedule; 


e if the sequence of actions is nonempty, then the preceding configuration of the first 
action is an initial configuration, and the succeeding configuration of each action is 
the same as the preceding configuration of the following action; 


e if an action (F,receive(m,p),F’) occurs at real time t, then F = F’ except for p's state 
and the state of the message buffer; moreover, there exist Y in F(X x P) and Z in F(R) 
such that the buffer in F’ is obtained from the buffer in F by executing the following 
operations: 


o ifm = START, then RECEIVE((START,p),t); 
if m = TIMER, then RECEIVE((TIMER,Ph. (t),p),t); 
if m = (x,p') for some p’, then RECEIVE((p’,x,p),t); 


o SEND((p,x,p’),t) for all messages of the form (x,p’) in Y; 


o SEND((TIMER,T,p),t) for all T in Z such that T > r (that is, as long as the timer is 
set for a future time); if T < r, then no operation is performed. 
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Furthermore, if p is in N, then (q,r,m,q’, Y,Z) is a step of p, where q is p's state in F, r = 
Ph p(t): and q’ is p’s state in F’. 


The first condition merely ensures that only a finite number of occurrences take place by any 
finite time. The second condition states that the configurations match up correctly. The final 
condition causes the configurations to change according to the process’ transition function, if it is 
nonfaulty. Since a faulty process need not obey its transition function, it can send any messages 


and set any timers. 


Given J, an initial configuration F, and a schedule s, a history can be constructed inductively by 
starting with F and applying the transition functions as specified by the events in s to determine 
the next configuration. We will denote the history so derived by hist(s,F, 9). 


Define, for each process p and history h, first-step(h,p) = min{t: h(t) contains an event for p}. 
This is the earliest time at which a step is taken by p in h. If p never takes a step, then first- 
step(h,p).is %. Let first-step(h) = min p€pifirst-step(h,p)}. This is the earliest time at which any 
process takes a step in h. Similarly, define, for each history h and nonfaulty process p, 
last-step(h,p) = min{t: h(t) contains a configuration in which p is in a final state}. This is the 
earliest time at which p is a final state. Define /ast-stepth) = max, ¢p{last-step(h,p)}. This is the 
earliest time in h after which all nonfaulty processes are in final states. If some p in N never enters 
a final state in h, then last-step(h,p) and last-step(h) are 6. 


2.6 Chronicles 


In order to isolate the steps of an individual process in a history from the real times at which they 
occur, we define a chronicle. 


The chronicle of nonfaulty process p in history h is the sequence of tuples of the form 
(ayrm,.a,,Y ,2,) which is derived as follows: if the i-th action for p occurs in h(t), then m, is the 
message received in that action, q, is the state of p in the preceding configuration of the action, iF 
is p's physical clock reading at real time t, q,’ is the state of p in the succeeding configuration, Y; is 
the collection of messages to be sent to the message buffer, and Zz is the collection of timers to be 
set. We know that each tuple is a step of p. 


Two histories, h for = (P,N,S,PH) and h’ for S’ = (P,N,S,PH’), are equivalent if, for each process 
pin N, the chronicle of p in h is the same as the chronicle of p inh’. 
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2.7 Shifting 


Given a schedule s, nonfaulty process p, and real number {, define a new schedule s’ = 
shift(s,p,{) to be the same as s except that an event for p appears in s'(t) if and only if the same 
event appears in s(t+ ¢), and the order of events for p is preserved. The result s’ can easily be 
seen to be a schedule also. All events involving p are shifted earlier by ¢ if { is positive, and 
shifted later by -¢ if { is negative. 


Aset of clocks PH = {Ph, } ep can also be shifted. Let PH’ = shift(PH,p,¢) for p in N be the set of 
clocks defined by PH’ = {Ph,'}.¢p where Ph, '(t) = Ph. tt) if q + p, and Pho'(t) = Ph (t) + ¢. 
Process p’s clock has been shifted forward by {, but no other clocks are altered. 


Lemma 2-1 states that if a schedule and a set of clocks are shifted by the same amount relative to 
the same process, then the histories derived from those schedules and sets of clocks starting 


from the same initial configuration are equivalent. 
Lemma 2-1: Let = (P,N,S,PH) and f' = (P,N,S,PH’), where PH’ = shift(PH,p,¢) for 
some process p and real number ¢. Let s be a schedule for P and s’ = shift(s,p,¢). Let 
F be an initial configuration for f and £'. Then the history hist(s,F,f) = h is equivalent 
to the history hist(s',F,S’) = h’. 
Proof: Let q be an arbitrary process in N. It suffices to show that the chronicle of qinh | 
is the same as the chronicle of q in-h’. 


Case 1: q # p. We proceed by induction on the elements of the chronicles. Let q’s 
chronicle in h be (m,qc¢,,Ph, (t,),qn,Y,,Z,) and in h’ be (m/.ac,,Ph, ‘(t;),an,,Y,',2;'). (qc 
stands for current state, qn for next state.) 


Basis: | = 1. Thent, = first-step(h,q) and t, = first-step(h’,q). By construction of h’, 
these real times are the same. Therefore, m, = m,’. Since F is the initial configuration 
in both h and h’, qc, = qc,’. Ph (t,) = Ph, '(t,’) since Ph, = Ph,’ by construction. 
Finally, qn, = qn,’, Y, = Y,',and 2, = Z,' since +, is deterministic and the inputs are 
the same. 


Induction: Assume the elements are the same up to i - 1, and show that the i-th 
elements are the same. Again, m, = m,’ by construction of h’; qc, = ac,’ by the 
induction hypothesis since qc, = qn, , = qn,,’ = Qc.’ Ph, (t) = Ph, '(t;) as before; 
finally qn, = qn,', Y, = Y;, and Z, = Z;' because r, is deterministic. 


Case 2: q = p. Again we proceed by induction on the elements of the chronicles. Let 
p's chronicle in h be (m,,qc,,Ph p(t)an,¥;,2,) and in h' be (m,’,q¢,’,Ph p ),an,¥)2;). 


First we note that by construction, t, = t,' + § for alli. 


Basis: i = 1. By construction, m, = m,’. Since F is the initial configuration in both h 
and h’, qc, = qc,’. Phiit,) = Ph, '(t,’) since Pht.) = Phy '(t,-$) = Ph, '(t,'+ $-$). 
Finally, qn, = qn,', Y, = Y,’, and Z, = Z,’ since 7) is deterministic and the inputs are 


the same. 


Induction: assume the elements are the same up to i - 1, and show that the i-th 
elements are the same. m, = m;’ by construction of h’; qc, = qc’ by the induction 
hypothesis; Ph (t) = Ph,’ (t’ ) by the same argument as in the basis case; and again qn, 
= qn’, i =Y), and Zz = Zz’ since e is deterministic. 8 


The next lemma quantifies the changes to the message delays in a history when its schedule and 


set of clocks are shifted by the same amount relative to the same process. 
Lemma 2-2: Let f = (P,N,S,PH) and %’ = (P,N,S,PH’), where PH’ = shift(PH,p,¢) for 
some p in P and real number {. Let s be a schedule for P ands’ = shift(s,p,f). Let F 
be an initial configuration for f and S'. Then there is a.one-to-one correspondence 
between the tuples in the message buffer inh = hist(s,F,f) and h’ = hist(s’,F,S'), and 
the message delays for corresponding elements will be the same in the two histories (if 
defined) except for two cases: 


1. if the delay for any tuple of the form (p,x,q) is » in h for any process q # p and 
message value x, then the delay for the corresponding element in h’ will be p + 
$; and 


2. if the delay for any tuple of the form (q,x,p) is p in h for any process q * p and 
message value x, then the delay for the corresponding element in h’ will be p - 


f. 


Proof: By Lemma 2-1, h and h’ are equivalent. Therefore, the chronicles of all the 
processes are the same. The same messages are sent and received at the same 
physical clock times in h' and h. Also, the message buffers have the same START 
elements since the initial configuration is the same for both. Therefore, each element 
of the message buffer in h has a corresponding one in h' and vice versa. 


START messages are still either received at some finite time or not, thus START 
elements have the same delays in the two histories. Since only p's clock is shifted, the 
clocks of the other processes will bear the same relationship to real time in h’ as in h, 
causing the delays for messages between processes other than p and the delays of 
timers for processes other than p to be the same in the two histories. The delays of 
timers for p will be the same as weil, since they are both set and received { earlier inh’ 
than inh. 


Choose q * p. 


1. Suppose (p,x,q) is sent at t and received at t’ in h. The relationship between s 
and s' implies that (p,x,q) is sent at t~ ¢ and received at t’ in h’. Thus the 
message delay inh’ ist'-(t-£) = p + ¢. 


2. Suppose (q,x,p) is sent at t and received at t' in h. The relationship between s 
and s’ implies that (q,x,p) is sent at t and received at t’ - ¢ in h’. Thus the 
message delay inh’ ist’-¢-t = p-. 


ee ee 
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2.8 Executions 


Now we require correct behavior of the message system. Accordingly, we define an execution to 
be a history with the necessary properties. 


We fix for the remainder of the thesis two nonnegative constants 5 and e with 5 > e. 


An execution for Sis a history for ¥ with four additional properties: 
e the initial state of the message buffer consists exactly of a START message for each 
process in S U (P-N), that is, for each self-starting process and each faulty process; 
e all START messages for nontaulty processes are received at some finite time; 


e the message delay of any non-TIMER and non-START message is between 6 - e and 
5 + e inclusive; and 


e any (TIMER,T,p) element of the message buffer, for any T and p, has finite message 
delay and is delivered at Ph 3M. 


The intent of the first condition is to model the self-starting processes as those processes that 
begin the algorithm on their own, and to allow the faulty processes to begin their bad behavior at 
arbitrary times. The second condition states that nonfaulty self-starting processes all receive their 
START messages. The third condition guarantees that all interprocess messages arrive at their 
destinations within 5 of being sent, subject to an uncertainty of e. The fourth condition ensures 
that a timer goes off if and only if it was previously set and that it goes off at the right time. 


2.9 Logical Clocks 


Each process p has as part of its state a local variable CORR, which provides a correction to its 
physical clock to yield the local time. During an execution, p's local variable CORR takes on 
different values. Thus, for a particular execution, it makes sense to define a function CORR, (t), 
giving the value of p’s variable CORR at time t. For a particular execution, we define the /oca/ 
time for p to be the function Ly which is given by Ph, + CORR,,. 


A logical clock of pis Ph, plus the vaiue of CORR, at some time. Let om denote the initial logical 
clock of p, given by Ph 2 plus the value of CORR, in p’s initial state. Each time p adjusts its CORR 
variable, it is, in effect, changing to a new logical clock C, for some i. The local time can be 
thought of as a piecewise continuous function, each of whose pieces is part of a logical clock. 
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Chapter Three 


Lower Bound 


3.1 Introduction 


In this chapter, we show a lower bound on how closely clocks can be synchronized, even if the 
clocks don't drift and no processes are faulty. Since these are strong assumptions, this lower 
bound also holds for the more realistic case in which clocks do drift and arbitrary faults occur. 
Just to show that the bound is tight, we present a simple algorithm that synchronizes the clocks 
as Closely as the lower bound. 


3.2 Problem Statement 


For this chapter alone we make the following assumptions: 
1. clocks don’t drift, i.e. dC, (t)/dt = 1 for all p and t; 


2. all processes are nonfaulty, i.e. N = P. Therefore, we will omit "N" from the notation. 


Since the processes have physical clocks which are progressing at the same rate as real time, the 
only part of the clock synchronization problem which is of interest is the problem of bringing the 
clocks into synchronization -- once this has been done, synchronization is maintained 
automatically. 


Aclock synchronization algorithm (P,S) is y,a-correct if every execution h for (P,S,PH), for any set 
of clocks PH, satisfies the following three conditions: . 
1. Termination: All processes eventually enter final states. Thus, last-step(h) is defined. 


2. Agreement: It, (t)-L q(t)l < 7 for any processes p and q and time t > last-step(h). 
We say h synchronizes to within y. 


3. Validity: For any process p there exist processes q and r such that rong (thea <b oft) 

< C(t) + afor all times t > last-step(h). This ensures that p’s new logical clock 

isn't too much greater (or smaller) than the largest (or smallest) old logical clock 
would have been at this time. We say h bounds the adjustment within a. 


We will show that no algorithm can be y,a-correct for y < 2e(1 - 1/n) and any a, where e is the 
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uncertainty in the message delivery time and n is the number of processes. Then we exhibit a 
simple algorithm that is 2e(1 - 1/n),e-correct. 


3.3 Lower Bound 


In this section we show that no algorithm can synchronize n processes’ clocks any closer than 


2e(1 -1/n). 
Theorem 3-1: No clock synchronization algorithm can synchronize a system of n 
processes to within y, for any y < 2e(1 - 1/n). 

Proof: Fix a system of processes (P,S) that synchronizes to within y. We will show that 
Y > 2e(1 - 1/n). 


Let P consist of processes p, through p A Consider the system J, = (P,S,PH,). 
Consider an execution h, = hist(s,,F, S. ;): for some schedule s, and_ initial 
configuration F, of any clock synchronization algorithm in which all messages from P; 
to p, have delay 5 - ¢ ifk >}, have delay 5 + ¢ ifk <j, and have delay 6 ifk = j. 


Consider n ~ 1 additional histories, h, for system 5, through h, for J The systems are 
constructed inductively by letting PH, = shift(PH, 4P},4:2e) and , = (P,S,PH). The 
histories are constructed inductively by letting 3 2 shift(s, -P, y2e). and h, = 
hist(s,, F AD Stated informally, the i-th history is obtained from the (i-1)-st history by 
shifting the schedule and set of clocks by 2¢ relative to the (i-1)-st process. Let Phi, 
be p's physical clock in PH,. 


By Lemma 2-1, all the h, are equivalent. 


Next we show by induction on i that h, is an execution for 5, and further, that the delays 
inh , for messages from p, to p, are 8 + eifj<iandk >i, 8-eifj> i and k <i, 
otherwise asin h, 


Basis: h, is an execution and the message delays are as required by hypothesis. 


ee Assume h, is an execution with the required message delays, and show that 


h.,, isalsoan execution with the required message delays. 


e The initial state of the message buffer is the same inh, , , as in h, since both 
use initial configuration F. Thus the initial state is as required. 


e The START messages are all received in h,, 3s they are inh. 


e By Lemma 2-2, a message in his from p, to p,,, m >i, will have delay -e + 2 

= 6 + e; one from p to p_, m <i, will have delay 5-e€ + 2e = 6 + €; one from 

P,, to p;, m > i, will have delay 5 + e-2e = 5-e; and one fromp, top, m<i, 

will have delay 6 + ¢-2e = 5-¢. The others stay the same. Thus the delays 
are within the correct range. 


e Now we need to show that timers are handled properly in h.,- Lemma 2-2 
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implies that the message delays are the same in h. +1 38 in hi, thus they are 
finite. For all processes except p,, the timers arrive at the same real times and 
the same clock times in hi, , asin hi, and thus they arrive at the proper times in 
h, , ,- Consider a timer set by p, for T that arrives at T = Ph’ (t) inh, Inh,,, it 
arrives at t + 2e. However, since Phi*? (t+2e) = Phi, (t) = T, the timer 
arrives at the proper time inh, , ,. i i 


Therefore, h, is an execution for 5. 


Since h, was correct, it terminated; therefore, h, also terminates. Let t = 
max, _, ,{last-step(h.)}. In execution h,, the algorithm synchronizes all the processes’ 
clocks to values v, through v , at time t,, and all the values are within y. in particular, 


Vv, SY, + 7: 


Since h, is equivalent to h, ,, the correction variable for any process p will be the same 
in both executions at time t,. The value of p, ,'s logical clock at t, will bev, , + 2e and 
the value of p,'s logical clock at t, will be v, by the way PH. is defined. Since these 
values are within y, we have 


V.6<V 


v.45 ),+ y-2e. 


Putting together this chain of inequalities, we have 
Vy SV, +7. SV, + (i-1)(y-2e) + y <.... S$, + (n~1)(y-2e) + y. 


Therefore, v. Sv, + (n—-1)(y -2e) + y, and so0 < (n-1)y—(n-1)2e + y. In order 
for this inequality to hold, it must be the case that y > 2e(1-1/n). 


3.4 Upper Bound 


In this section we show that the 2e(1 - 1/n) lower bound is tight, by exhibiting a simple algorithm 
which synchronizes the clocks to within this amount. 


3.4.1 Algorithm 

There is an extremely simple algorithm that achieves the closest possible synchronization. As 
soon as each process p receives a message, it sends its local time in a message to the remaining 
processes and waits to receive a similar message from every other process. immediately upon 
receiving such a message, say from q, p estimates q’s current local time by adding 6 to the value 
received. Then p computes the difference between its estimate of q's local time and its own 
current local time. After receiving local times from all the other processes, p takes the average of 
the estimated differences (including 0 for the difference between p and itself) and adds this 
average to its correction variable. Note that in contrast to many other agreement algorithms, in 
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this one each process treats itself non-uniformly with the others. 


Since it is obviously impractical to write algorithms in terms of transition functions, we have 
employed a clean, simple notation for describing interrupt-driven algorithms. To translate this 
notation into the basic model, we first assume that the state of a process consists of values for all 
the local variables, together with a location counter which indicates the next beginstep statement 
to be executed. The initial state of a process consists of the indicated initial values for all the local 
variables, and the location counter positioned at the first beginstep statement of the program. 


The transition function takes as inputs a state of the process, a message, and a physical time, and 
must return a new state and a Collection of messages to send and timers to set. This is done as 
follows. The beginstep statement is extracted from the given state. The local variables are 
initialized at the values given in the state. The parameter u is set equal to the message. The 
variable NOW is initialized at the given physical time + CORR. The program is then run from the 
given beginstep statement, just until it reaches an endstep statement. (lf it never reaches an 
endstep statement, the transition function takes on a default value.) The next beginstep after that 
endstep, together with the new values for all the local variables resulting from running the 
program, comprise the new state. The messages sent are all those which are sent during the 
running of the program, and similarly for the timers. 


There is a set-timer statement, which takes an argument U representing a logical time. The 
corresponding physical time, U ~ CORR, is the physica! time described by the transition function. 
(This statement is not used in this algorithm but will be used later in the thesis.) 


We will use the shorthand NOW to stand for the current logical clock time and ME for the id of the 


process running the code. 


For this algorithm, initial states are those in which the location counter is at the beginning of the 
code, local variables CORR and V have arbitrary values, and local variables SUM and 
RESPONSES have value 0. Final states are those in which the location counter is at the end of 
the code. 


The code is in Figure 3-1. 


We will show that any execution h of Algorithm 3-1 is y,a-correct, where y = 2e(1-1/n) anda = 
e. Thus, Algorithm 3-1 synchronizes the clocks to within 2e(1 - 1/n), showing that the lower 
bound is tight. The upper bound isn’t as unintuitive as it might look at first glance; it can be 


beginstep(u) 
send(NOW) to all q # ME 


do forever : 

if u = (v,q) for some message value v and process q then 
V:= v +6 - NOW 
SUM := SUM + V 
RESPONSES := RESPONSES + 1 
endif 

if RESPONSES = n - 1 then exit endif 

endstep 

beginstep(u) 

enddo 


CORR := CORR + SUM/n 
endstep 
Figure 3-1:Aigorithm 3-1, Synchronizing to within the Lower Bound 


rewritten as (2e + (n — 2)2e)/n, the average of the discrepancies in the estimated differences. 
The estimated differences of two processes for each other can differ by at most e apiece (giving 
the 2e term), and their estimated differences for the other n - 2 processes can differ by up to 2e 
apiece (giving the (n - 2)2e term). Then the estimated differences are averaged, so the sum is 
divided by n. A more careful analysis is given below. 


3.4.2 Preliminary Lemmas 
The next two results follow easily from the assumption that clocks don't drift. 
Lemma 3-2: For any p andi > 0, Cit) ~- om) at'- 
Proof: Immediate since the slope of Cl, is1. 
Lemma 3-3: For any p and q,i > 0, and timest and t’, C! (t')-C'(t) = Cl (t)- Cl (0). 


Proof: C! (t') -c! a) = t-t= ci a(t) - c a(t) by two sccllcaligns of Lemma 3-2. a 
result follows. # 


Now we can define the initial difference between two processes’ clocks in execution h. Define 
A, to be on - C° (t). That is, A. is the difference in local times before either of the processes 
has changed its correction variable. Since there is no drift in the clock rates, any time will give the 
same value. 
Lemma 3-4: For any execution h, and processes p and q, A oc -A 
Proof: Immediate from the definition of A. & 
Lemma 3-5: For any execution h, and processes p, q, andr, Ang = A, + A: 
Proof: immediate from the definition of A. 


ap’ 


a ean J 


3.4.3 Agreement 

For q * p, let Ms be the value of variable V in the code when q’s message is being handled by p. 
Vn = L(t) + 6-L ot), where local time L q(t) was sent by q at real time t and received by p at real. 
time t’. Let V coe 0. We will denote SUM/n, p’s addition to its correction variable, by A. 


First we relate the estimate Vig to the actual value Ay 
Lemma 3-6: Va =A, IS €. 
Proof: Suppose at foal time t, q sent the value L q(t), which was received by p at real 
time t’. Then 


Map ~ Aga! = Ieg(t) + 8-L,(t)- Ag) = ICP tt) + 8- CP (t)- Ag 
= IC) (t) + AL, + 8-C(t)- A, |, by definition of A, 

= CO) - Ctr) + 8 

= |t-t' + 4], by Lemma 3-2 

= [8-(t'-t) 

< | - (6- e)|, since 5 - e is the smallest message diate 


Here is the main result. 
Theorem 3-7: (Agreement) Algorithm 3-1 guarantees clock synchronization to within 
2e(1 -1/n). 
Proof: We must show that for any execution h, any two processes p and q, and all 
times t after last-step(h), 


IL (t) -L (tl < 2e -2e/n. 


Without loss of generality, assume p = p, andq = P,, So that the remaining processes 
are p, through P,. By the way the algorithm works, 


- = 0 - 0 = = 
IL,()- LoCo! = WC9t) + A,)-(C°%W) + ADI = 1A, + AL- ALL 
We know by definition of A s and A, that 


A, = (i/nyV,, + oo 2. ) and 


iz3..n Yap 


A, = (1/0XV + Vag + Zig Yaa 


Substituting these values and noting that Vap =V an 0, we get 


IL) -Lg(Ol = Id, + (17A)0VQ5 + 223 Ve n~Vpq~ Zina.nMpall 
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Vi - 2 


= (1 /n)ind,, + vi + 2) -3.n¥pp~ pa i=3.nVpal ; 


= (1/n)(AL, + Van) + (454 Vg) + 2i23..n'4og * Yop Vo! 


S(1/n)(l,, + Vaal + lana ~ “pal + lang +V -V 


=3..n Pp pal) 


< (1/n\le + e + 2%, 


i=3..n 


Iden, + A, ae Vip - Vial by Lemma 3-5 


lA. + Yop - Vpal by Lemmas 3-6 and 3-4 
= (1/n)(2e + =, 


i=3.n 


= (1/n)(2e + &, 


i=3..n 


KW, Ps A, 0) - “,, a 4, q))): by Lemma 3-4 
<(i/n)(2e + Zi23.0pp - A, ol + Zinanb Mpg - A, a) 
<(1/n)(2e + 2 ane + 2 3.ne) by Lemma 3-6 

< (1/nj(2e + (n-2)2e) 


= 2e(1-1/n). & 


3.4.4 Validity 
The validity result states that each new logical clock is within e of what one of the initial logical 


clocks would have been. 
Theorem 3-8: (Validity) Algorithm 3-1 bounds the adjustment within e. 
Proof: By definition, the amount to be added to CORR_ is A_ = (1/n) Z_¢,V_.. Then 


P 
min. epV, <A, < max €p’,,; Let q be the process with the minimum Let r be 
q ..P r€P mp ap 
the process with the maximum Vip" Then, 
Nos < A, < Vip 
By applying Lemma 3-6 to each end of this inequality, we get 
Aap © SVap S An S Vp SAD + @. 
Adding p's initial clock value ote for t > t,, we get 
0 
Cott) + Age SCO) + AL <$0° +A, +e, 
which together with the definition of A implies 


0 0 
Cc q-est in sc 7) +e. 1 
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Chapter Four 


Maintenance Algorithm 


4.1 Introduction 


This chapter consists of an algorithm to keep synchronized clocks that are close together initially, 
and an analysis of its performance concerning how closely the clocks are synchronized and how 
close the clocks stay to real time. The algorithm handles clock drift and arbitrary process faults. 
The algorithm requires the clocks to be initially close together and less than one third of the 
processes to be faulty. (Dolev, Halpern and Strong [2] show that it is impossible without 
authentication to synchronize clocks unless more than two thirds of the processes are nonfauity.) 


This algorithm runs in rounds, resynchronizing periodically to correct for clock drift, and using a 
fault-tolerant averaging function based on those in [1] to calculate an adjustment. The size of the 
adjustment is independent of the number of faulty processes. At each round, n? messages are 
required, where n is the total number of processes. The closeness of synchronization achieved 
depends only on the initial closeness of synchronization, the message delivery time and its 
uncertainty, and the drift rate. We give explicit bounds on how the difference between the clock 
values and reai time grows as time proceeds. The algorithm can be easily adapted to include 
reintegration of repaired processes as described in Section 4.8. 


4.2 Problem Statement 


We are now considering the situation in which clocks can drift slightly and some proportion of the 
processes can be faulty. Therefore, the statement of the problem differs from that in Chapter 3. 


For a very small constant p > 0, we define a clock C to be p-bounded provided that for allt 
1-p <1/(1 + p)  dC(t)/dt <1 + p< 1/(1-p). 


We make the following assumptions: 


1. All clocks are p-bounded, including those of faulty processes, i.e., the amount by 
which a clock’s rate is faster of slower than real time is at most p. (Since faulty 
processes are permitted to take arbitrary steps, faulty clocks would not increase their 


eS an. 
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power to affect the behavior of nonfaulty processes.) 


2. There are at most f faulty processes, for a fixed constant f, and the total number of 
processes in the system, n, is at least 3f + 1. 


3. ASTART message arrives at each process p at time T° on its initial logical clock C° , 
and t°_ is the real time when this occurs. Furthermore, the initial logical clocks are 
closely synchronized, i.e., Ic? (7) - mis | < B, for some fixed B and all nonfaulty p 
and q. 


We let tmax® = max qt° ot and analogously for tmin®. 


P nonfauity 


The object is to design an algorithm for which every execution in which the assumptions above 
hold satisfies the following two properties. 


1. y-Agreement: [L(t)-L (01 < 7, for allt > tmin® and all nonfaulty p, q. 


2. (a,,6,,0.5)-Validity: a,(t-tmax°) + T°- a, < L(t) < a,(t-tmin®) + T° + a, for allt 
> a and all nonfaulty p. 


The Agreement property means that all the nonfaulty processes are synchronized to within y. The 
Validity property means that the local time of a nonfaulty process increases in some relation to 
real time. We would, of course, like to minimize a,, Gp, My, and y. 


4.3 Properties of Clocks 


We give several straightforward lemmas about the behavior of (p- bounded) clocks. 
Lemma 4-1: Let C be any clock. 


(a) Ift, $t,, then 
(1= pitty) $ (tt )/(1 + p) S Cl) -Clt,) $ (1 + p)lt,-t,) < (t,-t)/(1 -p). 
(b) IFT, <T,, then | 


(1-p)(T,-T,) S (T,-T,)/1 +p) S$ eT.) - fT) < (1 + p(T,-T,) < (T,- T,)/(1-p). 
Proof: Straightforward. ll 
Lemma 4-2: Let C and D be clocks. 


(a) If dC(t)/dt = 1 and T, <T,, then 
(c(T,) - d(T) - (c(T,)- A(T, = K(e(T,) - e(T,)) - (A(T) - d(T.) S p(T, - T,). 


(b) IFT, <T,, then 


rn 
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K(c(T,) - d(T,)) - (c(T,) - d(T,))] = Mc(T,) - e(T ,)) - (G(T) - d(T) S 2p(T, - T,). 
(c) fdC(t)/dt = 1 andt, <t,, then 

(C(t) - D(t,)) - (Clt,) - D(t,))| = (C(t) - C(t,)) - (Dit) - D(t, ))1 < p(t, -t,). 

(d) lft, <t,, then 


(C(t) - D(t,)) - (C(t,) - D(t,))] = KClt,) - Clt,)) - (lt) - Dit,))] < 2pit, -t,). 
Proof: Straightforward using Lemma 4-1. § 


Lemma 4-3: Let C and D be clocks, T, < Ty Assume |c(T) - d(T)| < @ for ail T, T, < 
TST, Lett, = min{c(T,),d(T,)} and t = max{c(T,), d(T,)}. 


Then |C(t) - D(t)| < (1 + p)a for allt,t; St <t,. 
Proof: There are four cases, which can easily be shown to be exhaustive. 


Case 1: c(T,) <t $ e(T,). 


Let T, = C(t), so that T, ¢ T, < T,. By hypothesis, jc(T. y)- d(T,)I <a. Then IT, - 
Dit)] < z (1 + p)a, by Lemma ai, 


Case 2: d(T,) <t < d(T,). This case is analogous to the first. 
Case 3: c(T,)<t<d(T,). | 

Then c(T,)<t<d(T,). So C(t) > Dit), and thus 

IC(t) - D(t)] = C(t) - Dit) = (C(t)-T,) + (T, - D(t)) 

<i p)(t-c(T,)) + (1 + p)(d(T,)-t), by Lemma 4-1, 

= (1 + p)(d(T,)-c(T,)) S (1 + p)a. 


Case 4: d(T.) <t¢ c(T,). This case is analogous to the third. § 


4.4 The Algorithm 


4.4.1 General Description 

The algorithm executes in a series of rounds, the i-th round for a process triggered by its logical 
clock reaching some value T'. (It will be shown that the logical clocks reach this value within real 
time B of each other.) When any process p's logical clock reaches T, p broadcasts a T message. 
Meanwhile, p collects T! messages from as many processes as it can, within a particular bounded 
amount of time, measured on its logical clock. The bounded amount of time is of length (1 + )(8 
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+ 8 + e), and ig chosen to be just large enough to ensure that T! messages are received from all 
nonfaulty processes. After waiting this amount of time, p averages the arrival times of all the T 
messages received, using a particular fault-tolerant averaging function. The resulting average is 
used to calculate an adjustment to p’s correction variable, thereby switching p to a new logical 


clock. 


The process p then waits until its new clock reaches time Tit’ = T! + P, and repeats the 
procedure. P, then, is the length of a round in local time. 


The fault-tolerant averaging function is derived from those used in [1] for reaching approximate 
agreement. The function is designed to be immune to some fixed maximum number, f, of faults. It 
first throws out the f highest and f lowest values, and then applies some ordinary averaging 
function to the remaining values. In this paper, we choose the midpoint of the range of the 
remaining values, to be specific. 


4.4.2 Code for an Arbitrary Process 


Global constants: p, 8, 5, €, and P, as defined above. 


Local variables: 


e CORR, initially arbitrary; correction variable which corrects physical time to logical 
time. 


e ARR{[q], initially arbitrary; array containing the arrival times of the most recent 
messages, one entry for each process q. 


e T, initially undefined; focal time at which the process next intends to send a message. 


Conventions: 


e NOW stands for the current logical clock time (i.e., the physical clock reading + 
CORR). NOW is assumed to be set at the beginning of a step, and cannot be 
assigned to. : 


e REDUCE, applied to an array, returns the multiset consisting of the elements of the 
array, with the f highest and f lowest elements removed. 


e MID, applied to a multiset of reals numbers, returns the midpoint of the set of values 
in the multiset. 


The code is in Figure 4-1. 


fe te em a 


beginstep(u) 
do forever 


/* in case T! messages are received before this process reaches Ti sy 


while u = (m,q) for some message m and process q do 
ARR[q] := NOW 
endstep 
beginstep(u) 
andwhile 


/* fall out of the loop when u = ‘START or TIMER; begin round */ 


T := NOW 
broadcast(T) 
set-timer(T + (1 + p)(B + & + e)) 


while u = (m,q) for some message m and process q do 
ARR[q] := NOW 
endstep 
beginstep(u) 
endwhile 


/* fall out of the loop when u = TIMER; end round */ 


AV := mid(reduce(ARR)) 
ADJ := T + & - AV 
CORR := CORR + ADJ 
set-timer(T + P) 
endstep 

beginstep(u) 

enddo 


Figure 4-1:Algorithm 4-1, Maintaining Synchronization 


4.5 Inductive Analysis 


Although the algorithm is fairly simple, its analysis is surprisingly complicated and requires a long 


series of lemmas. 


4.5.1 Bounds on the Parameters 

We assume that the parameters p, 5, and e are fixed, but that we have some freedom in our 
choice of P and £, subject to the reasonableness of our assumption that the clocks are initially 
synchronized to within 8. We.would like 8 to be as small as possible, to keep the clocks as 
closely synchronized as we can. However, the smaller £ is, the smaller P must be (i.e., the more 
frequently we must synchronize). 
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There is also a lower bound on P. In order for the algorithm to work correctly, we need to have P 


sufficiently large to ensure the following. 


(1) After a nonfaulty process p resets its clock, the local time at which p schedules its next 
broadcast is greater than the local time on the new clock, at the moment of reset. 


(2) A message sent by a nonfaulty process q for a round arrives at a nonfaulty process p after p 
has already set its clock for that round. 


Sufficient bounds on P turn out to be: 

P>2(1 + p(B + e) + (1 + p)max{d, B + e} + pd, and 

P < B/4p-e/p- p(B + 6 + e)-28 -8-2e. 

A required lower bound on B is B > 4e + 4p(38 + 5 + 3e) + p(B + 5 + e). 


Any combination of P and 8 which satisfies these inequalities will work in our algorithm. if P is 
regarded as fixed, then 8, the closeness of synchronization along the real time axis, is roughly 4e 
+ 4pP. This value is obtained by solving the upper bound on P for 8 and neglecting terms of 
order p. 


4.5.2 Notation 
Let T! = T° + iPandU! = Ti+ (1 + p)(B + 8 + e), foralli> oO. 


For each i, every process p broadcasts T! at its logical clock time T! (real time t) and sets a timer 
to go off when its logical clock reaches U'. When the logical clock reaches u! (at real time ul): the 
process resets its CORR variable, thereby switching to a new logical clock, denoted C'*",. Also 
at real time uy the process sets a timer for the time on its physical clock when the new logical 
clock c!+? reaches T'*’. itis at least theoretically possible that this new timer might be set for a 
time on the physical clock which has already passed. If the timer is never set in the past, the 
process moves through an infinite sequence of clocks C. cl, etc, where C, is in force in the 
interval of real time (-00,u9 ), and each om i > 1, is in force in the interval of real-time [u' . u'). 
if, however, the timer is set in the past at some u' ’ then no further timers arrive after that real time, 
and no further resynchronizations occur. That is, ce e stays in force forever, and ul and t are 
undefined for j >i + 1. 


35 


Let tmin! denote ‘min i aia tpi and analogously for tmax’, umin! and umax', 


For p and q nonfaulty, let ARR' (a) denote the time of arrival of a T! message from q to p, sent at 
q's clock time Ti where the arrival time is measured on p's local clock Ce. (We will prove that C 
has actually been set by the time this message arrives.) Let AV denote the value of AV 
calculated by p using the ARR', values, and let ADJ! denote the corresponding value of ADJ 
calculated by p. Thus, C'*'| = Cl) + ADJ! 


This section is devoted to proving the following three statements for all i > 0: 
(1) The real time t! " is defined for all nonfaulty p. (That is, timers are set in the future.) 
(2) I - tJ < B, for all nonfaulty p and q. (That is, the separation of clocks is bounded by £.) 


(3) t + 6-e>u"' q: for all nonfaulty p and q, and i > 1. (That is, messages arrive after the 
appropriate clocks have been set.) . 


The proof is by induction. Fori = 0, (1) and (2) are true by assumption and (3) is vacuously true. 


Throughout the rest of this section, we assume (1), (2), and (3) hold for i. We show (1), (2), and (3) 
fori + 1 after bounding the size of the adjustment at each round. 


4.5.3 Bounding the Adjustment 
in this subsection, we prove several lemmas leading up to a bound on the amount of adjustment 


made by a nonfaulty process to its clock, at each time of resynchronization. 
Lemma 4-4: Let p and q be nonfauity. 


(a) ARR’ (a) STi + (1 + p+ 8+ e). 
(b) If 5- e > B, then ARR’ (a) >T' + (1-py(S-e-8). 


(c) f5-e <8, then ARR’ (q) > T'-(1 + py(B-8 + 2). 
Proof: Straightforward using Lemma 4-1. & 
Lemma 4-5: Let p be nonfaulty.. Then there exist nonfaulty q and r with 


ARR! 3) S AV < ARR! att). 
Proof: By throwing out the f highest and f lowest values, the process ensures that the 
remaining values are in the range of the nonfaulty processes’ values. I 


We are now able to bound the adjustment. 


re | 
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Lemma 4-6: Let p be nonfaulty. Then |ADJ' | < (1+ p(B + e) + pd. 
Proof: ADJ, =T+8 - AV 
Thus, for some nonfaulty q and r, Lemma 4-5 implies that 
T+ 8- ARR’ (a) < ADJ! ST! + 6- ARR! a(t). 
Then Lemma 4-4 implies that: 
(a) ADJ) > T' + b-(T' + (1 + p(B + 6 + €)) =-(1 + p(B + e)-pd. 
(b) if 8-e > B, then ADJ < Tl + 8-(T! + (1-p)(S-e-B)) = (1-p)B + e) + pb. 
(c) If8-e < B, then ADJ, <T + 5-(T'-(1 + pB-8 + e)) = (1 + p)(B + e)-pé. 


The conclusion is immediate. # 


4.5.4 Timers Are Set in the Future 
Earlier, we gave a lower bound on P and described two conditions which that bound was 
supposed to guarantee (that timers are set in the future and that messages arrive after the 
appropriate clocks have been set). In this subsection, we show that the given bound on P is’ 
sufficient to guarantee that the first of these two conditions holds. 

Lemma 4-7: Let p be nonfaulty. Then U! + ADJ, <T! a 

Proof: U! + ADJ, <u! + (1+ p)(B + e) + pd, by Lemma 4-6 


= U! + (2(1 + p(B + e) + (1 + p)d + pd)-(1 + p(B + 8 + e) 
<u! + P-(1 + p)(B + 8 + e), by the assumed lower bound on P 
aTitt g 


This lemma implies that timers are set in the future and that t't 7 is defined, the first of the three 
inductive properties which we must verify. 


4.5.5 Bounding the Separation of Clocks 

Next, we prove several lemmas which lead to bounds on the distance between the new clocks of 
nonfaulty processes. The first emma gives an upper bound on the error in a process’ estimate of 
the difference in real time between its own clock and another nonfaulty process’ clock reaching 


Ti . 
Lemma 4-8: Let p, q and r be nonfaulty. Then 
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(ARR! (q)-(T! + 6))-(c(T)-cl (TI Se + p(B + 6 + e). 
Proof: Let a be the real time of arrival of q's message at process p. Then a is at most 
c' (T') + 6 + e. Define a new auxiliary clock, D, with rate exactly equal to 1, and such 
that D(a) = C' (a). Thus, ARR’ (a) = D(a). So the expression we want to bound is at 
most equal to: ; 
[(D(a) - (T! + 8) - (ch (T)- d(T) + Ie! (7) - (TL. 
First we demonstrate that the first of these two terms is at most e. 
[D(a)-(T! + 8)-cl (T)) + A(T] 
= |ja-d(T' + 6)-c aT) + d(T')|, since D has rate 1 
= la-cl(T) + Tei+ 6)] 
Slci (1) + 8 + e-cl (T)- 4] 
= 8. 
Next we show that the second term, le} (T) - d(T’), is at most p(B + 5 + e). 
Case 1: clr) <a. Sop reaches T' before q's message arrives. 
Lety = a-cl (7). Theny<B+8+.e. 
Subecase 1a: d(T!) > eT). So C, has rate slower than real time. 
Then d(T’) - c! (T Vis largest when C. goes at the slowest possible rate, 1/(1 + p). in 
this case, d(T')-c' (T) = y-(a- dt )), where a- d(T!) = y/(1 + p). Thus, d(T!) - 
CAT) = y(1-1/(1'+ p)) = ye/(1 + p) S yp SB + 5 + 2). 
Subcase 1b: d(T’) < clr). So C, has rate faster than real time. 


Then c! (T') - d(T’) is largest when C_ goes at the fastest possible rate, 1 + p. Then 
eT) -a(T) = (1+ p)-y = yp S p(B + 8 + e). 


Case 2: c! att) > a. Sop reaches T' after q’s message arrives. 
Lety = cl(T)-a. Theny <B-8 +e. 
Subcase 2a: d(T’) > cl (T). So C, has rate faster than real time. 


An argument similar to that for case 1b shows that d(T’) - c(t ) <yp< p(B - 6 + e), 
which suffices. 


Subcase 2b: d(T') < c(t So C,, has rate slower than real time. 


An argument similar to that for case 1a shows that eT) - d(T’) < yp < p(B -6 + e), 


which suffices. 1 


In order to prove the next lemma, we use some results about multisets, which are presented in the 
Appendix. This is a key lemma because the distance between the clocks is reduced from 8 to 
8/2, roughly. The halving is due to the properties of the fault-tolerant averaging function used in 
the algorithm. Consequently, the averaging function can be considered the heart of the 


algorithm. 
Lemma 4-9: Let p and q be nonfaulty. Then 


Itc! ct) - 1 (7) - (ADS! op ADJ <B/2 + 2e + 2p(8 + 5 + e). 

Proof: We define multisets U, V, and W, and show they satisfy the hypotheses of 
Lemma A-4. Let 

Us c(t) - (7! + 5) + ARR! - 

V = cl (T)-(T' + 8) + ARR', and 

W = {c! (T/: ris nonfaulty}. 

U and V have size n and W has size n -f. 

Letx = e + p(B + 6 + e). 

Define an injection from W to U as follows. Map each element c' (T') in W toc! (T') - (r' 


+ 6) + ARR’ (r) in U. Since Lemma 4-8 implies that (ARR! (r) - (T' + 8) - (c(T) - 
ol (T))] < e + p(B + 8 + e) for all the elements of W, d,(W,0) = 0. Similarly, d,(W.V) 
= 0. 


Since any two nonfaulty processes reach T! within B real time of each other, diam(W) 
= p. 
By Lemma A-4, |mid(reduce(U)) - mid(reduce(V))| < 8/2 + 2e + 2p(B + 5 + e). 


Since mid(reduce(U)) = mid(reduce(c! (T') -(T' + 6) + ARR!) = cl (T!)~ ADU! and 
similarly mid(reduce(V)) = c',(T!) - ADJ}, the result follows. i” 


’ The next lemma is analogous to the previous one, except that it involves U! instead of T'. 
Lemma 4-10: Let p and q be nonfaulty. Then 


Iie! (U') - c! (U)) - (ADJ! - ADJ! DI < B/2 + 2e + 2p(2 + p(B + & + e). 
Proof: The given expression is 


$ Mle (7) - c), (1) - (ADU, - ADJ!) + Ke! (UW) - 1 (UY) - (CT) - cl TN 


< B/2 + 2e + 2p(8 + & + &) + 2p(i + pip + 6 + e), by Lemmas 4-9 and 4-2. 
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This reduces to the claimed expression. # 


Next we bound the distance in real time between two nonfaulty processes switching to their new 
clocks. It is crucial that the distance between the new clacks reaching U! be less than B in order 


to accommodate their relative drift during the interval between U! and T'*’, 
Lemma 4-11: Let p, q be nonfaulty. Then 


lc'* 1 (U)-cl* 1 (US B/2 + 2e + 2p(3B + 25 + Se) + 4p71B + 8 + e). 
Proof: We define idealized clocks, D_ and D_, as follows. Both have rate exactly 1. 


Also, D,(ul.) = C'*! (ul) = U! + ADJ and similarly for g. Then 

i+1 i i+ ji i+1 i i i i i i+] i 
le'* 1 U)-c'* UIE < Ie! FU) - UDI + 1d (U)-d (UY! + Id (U)-c!* UD, 
We bound each of these three terms separately. 

. : i+1 i i | i i i+1 I 
First, consider |c pU)-d(U )|. Now, U' + ADJ), = Dou o) =C ou p So 
le'** (U) -d U9] < Kee! (U) 4, (u')) - (c'* 1 (U! + ADU!) -d,(U! + ADU) ))| 
< pIADJ' I, by Lemma 4-2 
< p((1 + p(B + €) + pd), by Lemma 4-6.. 
The same bound holds for the third term. : 


Finally, consider the middle term, d_(U!) - d,(U))|. We know that d(U!) = d(U! + 
ADJ!) - ADJ! = ul - ADJ! , and similarly for q. 


37 
h lye yt —u! )— i i 
ld, (U)-d (U)] = (lu, - ui.) ~ (ADJ) - ADJ! 
< B/2 + 2e + 2p(2 + p(B + & + e), by Lemma 4-10, 
Combining these three bounds, we get the required bound. & 
Finally, we can show the second of our inductive properties, bounding the distance between 
times when clocks reach T'* 1, 


Lemma 4-12: Let p, q be nonfaulty. Then Wet-trt <p. 
Proof: tret-ttd 


= ic'** ct'*’) -c't AS as y 
< Welt +1) ne: cit yr’ uy) = (c!* *U) -clt+ ‘un + Ic'* 1!) = cit 1 Wl 


S 2p(P-(1 + p(B + & + €)) + B/2 + 2e + 2p(BB + 25 + Se) + 4p°(B + 8 + €), by 
Lemmas 4-2 and 4-11. 
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The assumed upper bound on P implies that this expression is at most 8. 8 


4.5.6 Bound on Message Arrival Time 
In this subsection, we show that the third and final inductive assumption holds. That is, we show 
that messages arrive after the appropriate clocks have been set. 
Lemma 4-13: Let p and q be nonfaulty. Then ae + 6-e> uy 
Proof: Since aoe +8-e> i*' -B + 5~e, it suffices to show that 
iu, >B-S +e. 
Now, t'* 47 ul pz (P-(1 + pip + 6 + e)- ADJ! )/(1 + p) since the numerator 
represents the smallest possible difference in the values of the clock C'*! pat the two 


given real times. 


But the lower bound on P implies that P > 3(1 + p)(B + €) + pd. Also, the bound on 
the adjustment shows that ADJ, < (1 + p(B + €) + pd. Therefore, 


ttt -ul, > @q + p(B + e) + pd-(1 + XB + § + e)-(1 + p(B + e)-p8)/(1 + 
p) 


= B-5 + e,asneeded. I 


Thus, we have shown that the three inductive hypotheses hold. Therefore, the claims made in this 
section for a particular i, in fact hold for all i. 


4.6 Some General Properties 


In this section, we state several consequences of the results proved in the preceding section. 


First, we state a bound on the closeness with which the various clocks reach corresponding 


values. 


Lemma 4-14: Let p, q be nonfaulty, i > 0. Assume that T is chosen so that U'' < T 
SU, ifiz 1, orsothatT? <T < fi = 0, 


Then |e! pf) - ry Q(MSB + 2p(1 + p)(B + 5 + e). 
Proof: Basis: i = 0. Then T°? <T. <u 


fo) - MI < He CT) - oT) - (7) - C9 TM + be (7) - C9 
< 2p(T - T°) + B, by Lemma 4-2 and assumption 3 


<B + 2p(1 + p(B + 8 +e). 
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Induction: i> 0. Choose T with U'' < T <UL. 
iver i i Agel ak ikl V agety— al get 
Ic pT)-c (TS ile pic (1) -(¢ (UU) -¢ (U | + Ie (U)-¢ (U )I 
<2pP + B/2 + 2e + 2p(3B + 26 + Se) + 4p7(B + 8 + e), by Lemmas 4-2 and 4-11. 
The upper bound on P implies the result. 


Next, we prove a bound for a nonfaulty process’ (i+ 1)-st clock, in terms of nonfaulty processes 


i-th clocks. 
Lemma 4-15: Let p be nonfauity, i > 0. Then there exist nonfaulty processes, q and 
r, such that for u!, < t < umax’, 


Chit)-a Ctl ) < Chit) + a, 


where a = € + p(4B + 5 + 5e) + 4p(B + 8 + e) + 2p + 8 + e). 


Proof: C'*! (t) = ci +Ti+6- AV. Therefore, by Lemma 4-5 there are nonfaulty 
processes, q and r, for which 


i i i i+1 i i i 
Cit) + Tl + 8- ARR (a) SC't1 ) SC + T+ 5-ARRL IO. 
We show the right-hand inequality first. Leta = cl (ARR! (1), the real time at which 
the message arrives at p fromr. Thus, C! (a) = ARR (1). Note that Ca)>T + (i- 
p)(8 ~ e). 
cts om +T+8- ARR (t), from above 
S Clit) + Cl(a)-Chta) + TI + 8-ARRI (A + (CL (0 - C(t) - (C. (a) - Ca) 
SCift) + Ci (a)-Cl(a) + T! + 8-ARR' (r) + 2p(t-a), by Lemma 4-2 since t>a 
<Ci(t) + ARR! (r)-T'~(1-p)(S-e) + T! + 5-ARR' (0) + 2p(t-a) 
= Ci (t) + e+ pd-pe + 2p(t—a). 


It remains to bound t - a. The worst case occurs when t = umax'. The longest 


possible elapsed real time between a particular nonfaulty process reaching T! and U! 
on the same clock is (1 + p)(B + 5 + e). Thus, umax!—tmin' < B + (1 + p)*(B +5 
+ e). Buta>tmin' + 6-e. Therefore, t-a<B + (1 + py*(B +6+e)-S+e 
Thus, cttw) <ci(t) +e + pd-pe + 2p(B + (1+ p)*(B +6 + e)-6 + e) 

i . 2 3 
= Cit) + e + p(4B + 6 + 3e) + 4p°(B + 5 + e) + 2p°(8 + 5 + e) 
<Clit) + a. 


For the left-hand inequality, we see that C! (t) - e - pS - pe - 2p(t- a) < C'*" (t), where 
az cl (ARR' (a). The factor t~ ais bounded exactly as before, so that we obtain: 
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i i+1 
C-asc*l iy. & 


4.7 Agreement and Validity Conditions 


We are now ready to show that the agreement and validity properties hold. The main effort is in 
restating bounds proved earlier concerning the closeness in real times when clocks reach the 


same value, in terms of the closeness of clock values at the same real time. 


4.7.1 Agreement 
The first lemma implies that the local times of two nonfaulty processes are close in those intervals 


where both use a clock with the same index. 
Lemma 4-16: Let p, q be nonfaulty. Then 


IC) - Chl < (1 + pB + 2p(t + p(B + 8 + e)) 
for max{u"t jul’ gists max{u! iu ghs ifi> 1, 


and for mingt 1°) gt< max{u° ,u°} ifi = 0. 
Proof: Basis: i = 0. Lemma 4-14 implies that 


lel (T)-clLMISB + 2p(1 + ph + 8 + e) 


for all T, Ub! << T <U! ifi > 1 and for all T, P< T <U° iti = 0. Then Lemma 4-3 
immediately implies the needed result fori = 0. 


Induction: i> 1. Lemma 4-3 implies the result for all t with 
ead pally pl gp gi-t i 

min{c pl )¢ ql )}} <t < max{u pu qs 

It remains to show the bound for t with 

max{u"' ue qi St< min{c\, (u''), el qu"). 


Without loss of generality, assume that c nu" i) < cl qu », so that the minimum is 
equal to c! au" ). 


i i i / i gpl i-1 i 1 
ICL ey - Cdl SIC) - Cl (ty) - (CL fe) (U") - CL fe (UI 
+ Ich u'y-C ic unl 


The first term, by Lemma 4-2, is at most 2p(c! au" 1) ~1). Since t > max{ul' put > 
ut poze" aw" ), we have 


2p(c',(U"")-1) < ple (Ut) eo" (Un). 
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Since oh* (U"') = c'(T) for some T with |T -U'"| < [ADJ I, this quantity is 

< 2ple' (U')- cM | 

< 2p(1 + p)|U''- TI, by Lemma 4-1 

S 2p(1 + pyIADJ| 

< 2p(1 + p)((1 + p(B + e) + pd), by Lemma 4-6. 

To bound the second term we note that Lemma 4-11 implies that 

leu") -c (uy <B/2 + 2e + 2p(3B + 25 + Se) + 4p*(B + § + €) = a, 
and so Lemma 4-3, with T, = T, = U'', implies that 

Ich (ch U')- Chic UN < (1 + pla. 

The assumed lower bound on B gives the result that 


2p(1 + p)((1 + p(B + e) + pd) + (1+ phas(1 + p(B + 2p(1 + p(B + 6 + e))l 


Here is the main result, bounding the error in the synchronization at any time. 
Theorem 4-17: The algorithm guarantees y-agreement, 


where y = B + e + p(7B + 36 + 7e) + Bp%(B + 8 + e) + 4p%(B + 5 + e). 
Proof: The result for intervals in which the processes use clocks with the same indices 
has been covered in the preceding lemma. The expression in the statement of that 
lemma simplifies to 
B + p(QB + 28 + 2e) + 4p7(B + 5 +e) + 2°18 + 8 + e), 
which is less than y. 
Next, we must consider the case where one of the processes has changed to a new 
clock, while the other still retains the old clock. Consider |C'*' (t) - C' (t)| for some t 
with ul Sts uy Lemma 4-15 implies that there exist nonfaulty processes r and 8 
such that 
Cij-a<c*" atts Clit) + a, 
where a = ¢ + p(4B8 + & + Se) + 4p(B + 8 + e) + 2p%(B8 + 8 + e). 

i+1 i fici i 
lO'** (ty - Ch (I <a + max{Ic!()- C(t IC) -C) 0} 
Sa + (1+ p)(B + 2p(1 + p(B + & + €)), by the preceding lemma 


2B +e + plTB + 38 + 7e) + 8p%B + 8 + e) + 4p%B + 8 +e), asneeded. 
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In some applications, it may never be the case that clocks with different indices are compared, 
perhaps because use of the clocks for processing ceases during the interval in which confusion is 
possible. In that case, the closeness of synchronization achieved by Algorithm 4-1 is given by 
Lemma 4-16, and is approximately 8 + p(38B + 25 + 2e). This value is more than e less than the 
bound obtained when clocks with different indices must be compared. 


Now we can sketch why it is reasonable for 8 to be approximately 4e + 4pP, as mentioned at the 
end of Section 4.5.1. Assume P is fixed. The i-th clocks reach T! within 8 of each other. After the 
processes reset their clocks, the new clocks reach u! within B/2 + 2e (ignoring p terms). By the 
end of the round, the clocks reach T'*' within about 8/2 + 2e + 2pP of each other, because of 
drift. This quantity must be at most 8. The inequality B/2 + 2e + 2pP < B yields B > 4e + 4pP. 


Suppose we alter the algorithm so that during each round, the processes exchange clock values 
k times instead of just once. Then we get 8/2" + (4-27")e + 2pP < B, which simplifies to top > 
4e + 2pP(2*/(2*-1)). It appears that B > 4e + 2pP is approachable. 


If the number of processes, n, increases while f, the number of faulty processes remained fixed, a 
greater closeness of synchronization can be achieved by modifying Algorithm 4-1 so that it 
computes the mean instead of the midpoint of the range of values. 


As in[1], we show that the convergence rate of algorithms that use the mean instead of the 
midpoint is roughly f/(n-2f). 


The result is based on the following lemma concerning multisets. 


Lemma 4-18: Let U, V, and W be multisets such that |U| = [Vj = n > 3f + 1 and [Wj 
= n-t. Ifd,(W,U) = d(W,V) = 0, then 


|mean(reduce(U)) - mean(reduce(V))| < diam(W)f/(n-2f) + 2x. 


The analysis of the modified Algorithm 4-1 parallels that just presented. However, the upper 
bound on P becomes 


P < B(n-3f)/(n-2f)2p - e/p- p(B + & + €)-28 -8-2e. 


This bound implies B > 2(n-2f)(e + pP)/(n-Gf), which approaches B > 2e + 2pP as n 
approaches infinity. 


We now demonstrate that this bound is reasonable. After updating the clock and then waiting 
until the clocks reach the next T', the clocks must still be within B, giving f8/(n-2f) + 2e + 2pP < 


ee ee 
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B, which implies 8B > (2e + 2pP)(n-2f)/(n-3f), which approaches 2e + 2pP as n approaches 
infinity. 


4.7.2 Validity 

Next, we show the validity condition. The first lemma bounds the values of the zero-index clocks. 
Lemma 4-19: T° + (1 - p)(t-t?,) < c(t) <1? + (1+ py(t-1°)) fort > i 
Proof: By Lemma 4-1. & 


The next lemma is the main one. 
Lemma 4-20: Let p be nonfaulty, i > 0. Then 


(1 - p)(t-tmax®) + T°-ie Sciit) <(1 + p)(t-tmin®) + T° + ie 

for allt > u"' ifi> 1, and for allt >t) if i = 0. 

Proof: We proceed by induction on i. When proving the result for i + 1, we will 
assume the result for i, for all executions of the algorithm (rather than just the 
execution in question). 

Basis: | = 0. This case follows immediately by Lemma 4-19. 

induction: Assume the result has been shown for i and show it fori + 1. 

We argue the right-hand inequality first. The left-hand inequality is entirely analogous. sf 
Assume in contradiction that we have a particular execution in which oc ()>(1 + 
p(t - tmin®) T+ (i+ 1)e for some t > u'.. Then by the limitations on rates of 
clocks, itis clear that C'*" (ul) >(1 + pul, - fmin®) + T+ (i+ 1)e. 


Recall that p resets its clock at real time u!_, by adding T' + 8 - AV. In this case, the 
inductive hypothesis implies that the adjustment must be an increment. 


By Lemma 4-5, this increment is < T! + 8 - ARR! 3(4) for some nonfauity q. Therefore, 
chu!) +Tie 8 - ARR! (q)>(1 Fe p)(u', - tmin®) + T+ (i+ 1)e. 

Next, we claim that if p had done the adjustment just when the message arrived from q 
rather than waiting till real time u!_, the bound would still have been exceeded. That is, 
ARR! (q) + Ti + 6 - ARR'(q)> (1 + p)(t’ - tmin®) + T° + (i+1)e, where t’ = 
c! (ABR (q)). (This again follows by the limits on the rates of clocks.) Thus, 

Tl + 8>(1 + p)t'-tmin®) + T° + (i+ te. 

Now consider an alternative execution of the algorithm in which everything is exactly 
like the one we have been describing, except that immediately after q sends out clock 


reading a q’s clock Cy begins to move at rate 1. This change cannot affect p’s 
(i+ 1)-st clock because q doesn’t send any more messages until ae and these 


Aenea ee ee Be Se ee ee ree ne Semen ARR SOR ee IN ERR Ge ER ta ee 


46 


messages aren't received until after the time when p sets its (i+ 1)-st clock. 

By the lower bound on message delays, q's message to p took at least 6 - « time. Then 
at real time t’ (defined above), we have ct) >T' + -e. Butthen c qt) >(1 + p)it’ 
- tmin®) + T° + ie. 


But then the inductive hypothesis is violated, since t’, the time when p receives q’s T 
message, is greater than or equal to uit q’ the time when q sets its round i clock. 8 


Now, we can state the validity condition. Let p = (P-(1 + p)(B + e)- pd) /(1 + p). This is the 
size of the shortest round in real time since the amount of clock time elapsed during a round is at 


least P minus the maximum adjustment. 
Theorem 4-21: The algorithm preserves (a,,0,,0,)-validity, 


where a, = 1-p-e/p,a, = 1+ p + e/g, anda, = €. 
Proof: We must show for ail t > ee and all nonfaulty p that 


a,(t-tmax®) + Ta, L(t) <a,(t-tmin®) + T° + a, 
We know from the preceding lemma that fori > 0, t > u'’ » (OF e): and nonfaulty p 
(1 - p){t-tmax®) + To ie < Cli) < (1 + p)ft-tmin®) + T° + Ie. 


Since L_(t) is equal to c! (t) for some i, we just need to convert i into an expression in 
terms of t, etc. An upper bound oniis1 + (t-tmax")/q. Then 


(1 + p){(t-tmin® + T° + ie <(1 + p)(t-tmin®) + T° +(1 + (t-tmax®)/p)e 
<(1 + p + e/@)(t-tmin®) + T° + e, since tmin® < tmax®, 

and that 

(1 - p){t-tmax®) + T°- ie > (1 - p)(t-tmax®) + T°-(1 + (t-tmax")/p)e 

> (1-p-e/g)(t-tmax®) + T°-e. 


The result follows. 8 


4.8 Reintegrating a Repaired Process 


Our algorithm can be modified to allow a faulty process which has been repaired to synchronize 
its clock with the other nonfaulty processes. Let p be the process to be reintegrated into the 
system. During some round i, p will gather messages from the other processes and perform the 
_ game averaging procedure described previously to obtain a value for its correction variable such 
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that its clock becomes synchronized. Since p’s clock is now synchronized, it will reach 7 
within B of every other nonfaulty process. At that point, p is no longer faulty and rejoins the main 
algorithm, sending out T'* ' messages. 


We assume that p can awaken at an arbitrary time during an execution, perhaps during the middle 
of a round. It is necessary that p identify an appropriate round i at which it can obtain all the T 
messages from nonfaulty processes. Since p might awaken during the middle of a round, p will 
orient itself by observing the arriving messages. More specifically, p seeks an i such that f qi 
messages arrive within an interval of length at most (1 + p)(B + 2e) as measured on its clock. 
There will always be such an i because all messages from nonfaulty processes for each round 
arrive within 8 + 2e real time of each other, and thus within (1 + p)(B + 2e) clock time. At the 
same time as p is orienting itself, it is collecting T messages, for all j. 


Assuming that p itself is still counted as one of the faulty processes, atleast one of the f arriving 
messages must be from a nonfaulty process. Thus, p knows that round i - 1 is in progress or has 
just ended, and that it should use T' messages to update its clock. 


Now p collects only T! messages. It must wait (1 + p)(B + 2e + (1 + p)(P + (1 + p(B + e) + 
p6), as measured on its clock, after receiving the f-th T! message in order to guarantee that it 
has received T' messages from all nonfaulty processes. The maximum amount of real time p must 
wait, (B + 2e + (1 + p)(P + (1 + p)(B + 2e) + pd), elapses if the f-th T'’ message is from a 
nonfaulty process'q and it took 6 - e time to arrive, if q’s round i- 1 lasts a long as possible, (1 + 
p(P + (1 + p)(B + ©) + pd) (because its clock is slow and it adds the maximum amount to its 
clock), and if there is a nonfaulty process r that is 8 behind q in reaching T' and its T! message to 
p takes 6 + e. The process waits this maximum amount of time multiplied by (1 + p) to account 
for a fast clock. 


(Some extra bookkeeping in the algorithm is necessitated by the fact that T messages from 
nonfaulty processes can arrive at p before p has received the f-th qt message. This scenario 
shows why: Suppose p receives the first T'? message at real time a, it is from a nonfaulty process 
q, and its delay is 8 + e, and that the f-th T'' message is received B + 2e after the first one. Also 
suppose that q's round i- 1 is as short as possible in real time, P- (1 + p)(B + e)-p6)/(1 + p), 
that there is a nonfaulty process r that begins round i 8 before q does, and that r's Ti message to p 
arrives at real time b and has delay 6 - e. 


We show that b <a + B + 2e, implying that the T' message is received before the f-th T'! 


ee ee 


message. 
. 

b=t +d-e 

=t,-B+d-e 

= tt + (P-(1 + p)(B + e)-p8)/(1 + p)-B + b-e 

>thl) + ((1 + pi@B + Se) + pS-(1 + p)(B + e)-pd)/(1 + p)-B + 8-e, by lower bound on P 

i-t 
= t' at Brb+e 
z=a-d-e+B+ire. 


Thus, b >a + 8. However, if P is very close to the lower bound, then b is approximately a + B, 
which is less than a + B + 2e.) 


Immediately after p determines it has waited long enough, it carries out the averaging procedure 
and determines a value for its correction variable. — 


We claim that p reaches T'*' on its new clock within B of every other nonfaulty process. First, 
observe that it does not matter that p’s clock begins initially unsynchronized with all the other 

‘clocks; the arbitrary clock will be compensated for in the subtraction of the average arrival time. 
Second, observe that it does not matter that p is not sending out a T message; p is being counted 
as one of the faulty processes, which could always fail to send a message. (Processes do not 
treat themselves specially in our algorithm, so it does not matter that p fails to receive a message 
from itself.) Finally, observe that it does not matter that p adjusts its correction variable whenever 
it is ready (rather than at the time specified for correct processes in the ordinary algorithm). The 
adjustment is only the addition of a constant, so the (additive) effect of the change is the same in 
either case. 


We want to ensure that when a process that is reintegrating itself into the system finishes 
collecting T' messages and updates its clock, this new clock hasn't already passed T'*’. The 
reason for ensuring this is that the process is supposed to be nonfaulty by T'*+’ and send out its 
clock value at that time. 


The code is in Figure 4-2. 


INFO is an array, each entry of which is a set of (process name, clock time) pairs. When a T! 
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beginstep(u) 
do forever 
if u = (T',q) and (q,T) € INFO[i] for any T then 
INFOLi] := INFOLi] U {(q,NOW)} 
if |{(q,T) € INFOLi]: q is any process and 
T > NOW - (1 + p)(B + 2e)}| = f 
then exit endif 
endif 
endstep 
beginstep(u) 
enddo 


/* p knows it should use round i values */ 


do for each (q,T) € INFOLi] 

ARR[q] := T 

enddo 
set-timer(NOW + (1 + p)(B + 2e + (1 + p)(P + (1 + p)(B + ©) + pd))) 
endstep 


beginstep(u) 
while u = (T',q) for the chosen 4 do 
ARR[Eq] := NOW 
endstep 
beginstep(u) 
endwhile 


/* fall out of loop when timer goes off */ 


AV := mid(reduce(ARR)) 

ADJ := T' +68 - AV 

CORR := CORR + ADJ 
set-timer(T? + P) 

endstep 

/* switch to Algorithm 4-1 */ 


Figure 4-2:Aigorithm 4-2, Reintegrating a Repaired Process 


message arrives from process q, p checks that q hasn’t already sent It a Ti message. If not, then 
q’s name and the receiving time are added to the set of senders of T', INFO[i]. if f distinct T! 
messages have been received within the last (1 + p)(8 + 2e) time, then p knows that it should | 
use T' messages to update its clock. 7 


The current lower bound on P, the round length, is not large enough to ensure that when the 
reintegrating process finishes collecting Ti age inl and beat its clock, this new clock heen 
already passed T'*!, 


There are two ways to solve this problem: 
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1, make the minimum P approximately three times as large as it currently must be; 


2. have the process send out its clock value at T't?. it can be collecting T'* ' messages 
all along, but now it knows a tighter bound on when to stop collecting them (since its 
(i+ 1)-st clock is synchronized with the other nonfaulty processes’ clocks). This will 
work as long as the time at which it stops collecting T' messages isn't after the 
process’ (i+ 2)-nd clock has reached T'*?. 


Now we show that P must be about three times as large as the previous lower bound in order to 
prevent the reintegrating process from waiting too long before updating its clock. The actual 
criterion we use is that the process must update its clock at least 8 before any other nonfaulty 
process’ (i+ 1)-st clock reaches bia (Since the process’ new clock is synchronized with those 
of the nonfaulty processes, it will not reach T'+’ more than B before any other nonfaulty clock 
does.) 


Let p be a process being reintegrated during round i and let t be the real time when p stops 
collecting T! messages 
Lemma 4-22: Ift <c'*! Au i+1) _ 8 for any nonfaulty process q, then 


P>(6B + 3 + Se + plGB + 36 + 160) + 6B + 8 + 14e) + p'(4B + 38:+ 8e) 

+ p(B + 5 + 2e)) /(1~5p - 3p" - p*). 
Proof: The worst case occurs if p waits as long as possible to finish collecting T' 
messages and another nonfaulty process q reaches T'* ' as soon as possible. 


Suppose p receives the first T'! message at real time t’, and the f-th T'' message at f° 
+ (1 + p)*(B + 2e) (because its clock is slow). According to the reintegration 
algorithm, p will then wait (1 + p)(B + 2e + (1 + p)(P + (1 + p(B + 2e) + pd)) on its 
clock, which means it will wait (1 + p) times as long in reat time. 

Thus, t = t' + (1 + p)*(2B + 4e + (1 + p)(P + (1_+ p(B + 2e) + 98). 


Now assume that the first T'* message received p pice a a hehpatis process q 


and that it took 8 + ¢ time to arrive. Thus oc’! ly . if round i- 1 and— 
round i both take the shortest amount of rea! time, (1 - a o + p(B + 2) - pd), 
then 


ol TT) = LT) + 21 - pP~(1 + p)B + e)~ pd). 
We want to ensure that c!*’ qt* )-t>Bie., 


~6- € + 21 - p(P-(1 + p(B + e)-p8) 
fice + p)*(28 + 4e + (1+ py(P + (14 phB + 2e) + ps) >B. 


This inequality simplifies to the stated bound. # 
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This new lower bound on P is about three times the size:of the previous one, which was 
P>2B + 5 + 2e + 2p(B + & + e). 


If increasing the lower bound on P is unacceptable, the second solution can be employed. Its 
drawback is that now it will take longer for a process to be reintegrated. A similar argument to the 
above shows that in order to guarantee that p finishes collecting T messages at least 8 before 
any nonfaulty process reaches T'*?, we must have 


P> (5B + 5 + 10e + 2p(5B + 25 + 9e)) /(2- 4p), ignoring p* terms. 


This lower bound is fairly close to the original one. For absolute certainty that the original lower 


bound will suffice, the process.can wait until T'*?, 
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Chapter Five 


Establishing Synchronization 


5.1 Introduction 


In this chapter we present an algorithm to synchronize clocks in a distributed system of 
processes, assuming the clocks initially have arbitrary values. The algorithm handles arbitrary 
failures of the processes and clock drift. We envision the processes running this algorithm until 
the desired degree of synchronization is obtained, and then switching to the maintenance 
algorithm described in the previous chapter. 


5.2 The Algorithm . t 


5.2.1 General Description 

The structure of the start-up algorithm is similar to that of the algorithm which maintains 
synchronization. It runs in rounds. During each round, the processes exchange clock values and 
use the same fault-tolerant averaging function as before to calculate the corrections to their 
clocks. However, each round contains an additional phase, in which the processes exchange 
messages to decide that they are ready to begin the next round. This method of beginning rounds 
stands in contrast to that used by the maintenance algorithm, in which rounds begin when local 
clocks reach particular values. A more detailed description follows. 


Nonfaulty processes will begin each round within real time 5 + 3e of each other. Each nonfaulty 
process begins the algorithm, and its round 0, as soon as it first receives a message. (It will be 
shown that this must be within 6 + 3e.) At the beginning of each round, each nonfaulty process p 
broadcasts its local time. Then p waits a certain length of time guaranteed to be long enough for 
it to receive a similar message from each nonfaulty process. At the end of this waiting interval, p 
calculates the adjustment it will make to its clock at the current round, but does not make the 
adjustment yet. . | 


Then p waits a second interval of time before sending out additional messages, to make sure that 
these new messages are not received before the other nonfaulty processes have reached the end 
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of their first waiting intervals. At the end of its second waiting interval, p broadcasts a READY 
message indicating that it is ready to begin the next round. However, if p receives f +. 1 READY 
messages during its second waiting interval, it terminates its second interval early, and goes 
ahead and broadcasts READY. As soon as p receives n — f READY messages, it updates the 
clock according to the adjustment calculated earlier, and begins its next round by broadcasting 
its new clock value. (This algorithm uses some ideas from [3].) 


A process need only keep clock differences for one round at a time. The waiting intervals are 
designed so that during round i a nonfaulty process p will not receive a READY message from 
another nonfaulty process until p has finished collecting round i clock values. Round i+ 1 clock 
values are not broadcast until after READY is broadcast, so p will certainly not receive round i + 1 
clock values until after it has finished collecting round i clock values. However, round i+ 1clock 
values might arrive during the second waiting interval and while the process is collecting READY 
messages. As a result, the adjustment is calculated at the end of the first waiting interval and the 
difference for any round i + 1 clock value received during round i is decremented by the amount 
of the adjustment. | 


5.2.2 Code for an Arbitrary Process 


Global constants: 5, e, p,n, f: as usual. 

Local variables (all initially arbitrary): 
eT: clock time at which current round began. 
eU: clock time at which the first waiting period is to end. 
eV: clock time at which the second waiting period isto end. 


e DIFF: array of clock differences between other processes and this one for current 
round. 


@ SENT-READY: set of processes from whom READY messages have been received in 
current round. 


e CORR: correction variable. 


e A: adjustment to clock. 


The code is in Figure 5-1. 


beg instep(w) 
do ita /* each iteration is a round */ 
:= NOW 
pruadeas eat) 
U:= T + (1 + p)(28 + 4e) 
set-timer(U) 


/* first waiting interval: collect clock values */ 


while ~(w = TIMER & NOW = U) do 
if w= (m,q) then DIFF[q] := m+ 6 - NOW endif 
endstep 
beginstep(w) 
endwhile 


/* end of first waiting interval °/ 


A := mid(reduce(DIFF)) 

V:= U + (1 + p)(4e + 4p(6 + 2e) + 2p" (6 + 2e)) 
set-timer(V) 

SENT-READY := 2 


/* second waiting interval: collect READY messages and clock values 
for next round */ , 


while ~(w = TIMER & NOW = V) do 
if w = (READY,q) then 
SENT-READY := SENT-READY U {q} 
if |SENT-READY| = f + 1 then exit endif 
elseif w = (m,q) then ik 7s m+ 8 - NOW endif 
endstep 
beginstep(w) 
endwhile 


./* end of second waiting interval due to timer or f + 1 READY messages */ 


broadcast( READY) 
endstep 
beg instep(w) 


/* collect n - f READY messages and next round clock values °/ 


while true do ; 
if w= (READY,q) then. 
SENT-READY := SENT-READY U {q} 
if |SENT-READY| =n - f then exit endif 
elseif w = (m,q) then DIFF([q}] := m+ & - NOW endif 
endstep 
beg instep(#) - 
endwhile 


-/* update clock and begin next round °/ 


DIFF := DIFF - A 
CORR := CORR + A 
endstep 
beginstep(w) 
enece 


Figure 5-1 :Algorithm 5-1, Establishing Synchronization 


5.3 Analysis : 


Wert use te folowing notation in adciion to tet mredvons sendy 
eva! (ch is the value of q's round message te p. ; . . 
gw @ = VAL la) +8- - Anh i en 


o DIFF! ae ma Ff oe 

et i the resi time when p-begine round, 

ou, i te rel tine whan p begins the encore wating interval acing round 

ev nthe ral time wen p sone READY during oun | and thun enc the second 
waiting interval}. 


+ otha re ie whan itech ond toch hag em 
o rdy',(@) is the real time when p first receives REAGY trom ¢ during sound i 
ota’ . max{t } for p:neninulty, te metre tee when natty proces bape 


xo mt msn ina | 


Note that trax’ has « slighty diferent meaning from tat in Chapter 4. 


From new on, terms of order 9 and higher wil be ignowad Since 19 sxconds ie an chen quoied 


 fesnonable value for p (5,7, 1, toms of order 9? ane stn. Tho senend-onder terms in the > 


ssvignment to V in line 13 of the code are needed for sitet ne h tet wll nat appear in the 
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Lemme &1: Let i > 6 pt « ey en 
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Thee | 
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rdy'(p) >v, + be 
2Vvi + 8-¢ 


> tl + (25 + 4e) + (4e + 4p(8.+ 2e)) + 5- a by delintion:ofv and the upper 
bound on the drift rate 


= t+ 35+ 7e + 4p8 + pe, 

and 

ui Sti + (ti-t) + uit) 

Ste (Hy-td + (1 + p)*(28 + 4e), by definition of u', and the lower bound on the 
drift rate 


mt + (ty-t) + 28 + 4e + 4pd + Spe. 

Thus, t! > ul, - (t,t) -26 - 4e - 46 - pe, implying 

rdy',(o) 2 ul, (t,-t)- 25 -4e -4p8 - Bpe + 38 + 7e + 4p8 + Spe 
=ul-(tj-t) + + Se. I 

Lemma 5-2: For any nonfaulty processes p and q and any | 2 0, 

(a) It, -t1<8 + 3e, and 


(b) rdy',(p) > u'.. 
Proof: We proceed by induction on i 


Basis: i = Q. 


(a)? -P ss + e, because as soon as p wakes up, it sends its round Omessageto 
all other processes. The receipt of this message, which occurs at most 5 + ¢ later, 


Seuses gto Rega roane (hm i naan :t areacy coe en: 

(b) Let rbe the first nonfaulty process to send READY at round 0. By Lemma 5-1, 
rdy (p) > ue -(-t) + 8 + Be | 

2 uo, - (8 + @) + 8 + Se, by part (a) 

| uw. 
Induction: Assume for i- 1 and show for i. 

(a) Let s be the first nonfaulty process to begin round i. Then s receives n ~ f READY 


messages during its round i- 1 {after u"'). At least n - 2f of them are from nonfaulty 
processes by part (b) of the induction typottpaie. These n - 21 nonfaulty processes 
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also send READY messages to all the other processes. By t + 2e, every nonfaulty 
process receives at least n - 2f > f + 1 READY messages and broadcasts READY. 
Thus q receives n- f READY messages by t + 2e + 8 +e. Thus, 


t <t+8+3e 


q=~= 8 

St, + 8 + 3e, by choice of s, 

which implies tnt, <6 + 3. 

By reversing the roles of p and q in the above argument, we obtain tt, <8 + 3e. 
(b) Let r be the first nonfaulty process to send READY at round i. By Lemma 5-1, 
rdy'(p) 2 ul,-(t,-t) + 8 + 3e 

> u',-(6 + 3e) + 8 + 3e, by part (a) 


i 
sui 


Next we show that a process waits a sufficient length of time to receive clock values from all 
nonfaulty processes before beginning the second waiting interval in a round. 
Lemma 5-3: Let p and q be nonfaulty, and i > 0. Then arr! 59) Sul. 


Proof: By the lower bound on the drift rate, uz, + 28 + 4e. Lemma 5-2 implies 
that q sends its round i clock value by t, +8 + 3e. "Thus ar oa) St, +28+4¢< 
i 
US i 
The next two lemmas bound haw long @ round can last for one process. First we bound how long 
‘aprocess must wait after sending READY to receive n -f READY messages. . 
Lemma 5-4: For p nonfaulty and i > ott tv) 28 + 4e + 4p(8 + 4e). 
Proof: The worst case occurs if p is as far ahead of the other nonfaulty processes as 
possible, its clock is fast, the other clocks are slow, and the slow processes’ READY 
messages take as long as possibie to arrive. However, as soon as they arrive, p begins 
the next round. Let q be one of the slow nonfaulty processes. 


fet av = (tt-v) + vj-ul) + (u',-t)) + (ty-ty)- Wi - ul) -tul-t) 


S(5 + e) + (1 + p)*(4e + 4p(8 + 2e)) + (1 + p)%{28 + 4e) + (8 + Se) 
~(4e + 4p(5 + 2e))—(28 + 48) 


= 28 + 4e + 4p(5 + 4e), ignoring p* terms. B 
Lemma 5-5: For any nonfaulty process p and any i > 0, 


fet -t <48 + 12e + 4p(38 + 102). 
Proof:t*! -t) = (tt -v) + Wvo-u) + w'-t) 
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< 25 + 4e + 4p(5 + 4e) + (wij-u') + (ul - th), by Lemma 5-4 
< 25 + 4e + 4p(5 + 4e) + (1 + p)*(4e + 4p(S + 2e)) + (1 + p)*(25 + 4e) 


= 46 + 12e + 4p(35 + 102). 8 


Now we give an upper bound on how far apart tmax! and tmax'* ' can be. 
Lemma 5-6: For any i > 0, 


tmax!*'-tmax! < 45 + 12e + 4p(36 + 10e). 
Proof: Let p be the nonfaulty process such that i?" = tmax'*'. Then 


i+1_ is i+ 1 i j+1 
tmax tmax' = ft o7 tmax <t : t 
< 46 + 12€ + 4p(36 + 102), by Lemma 5-5. B . 
Lemma 5-7 bounds the amount of real time between the time a nonfaulty process receives a 


round i message from another nenfeuty Process and the time the last nonfaulty process begins 


round i + 1. 
Lemma 5-7: For any i> 0 and nonfaulty processes p and q, 


tmax'* arr! (q) S55 + 192 + 4p(38 + 10e). 
i+1 j 
Proof: tmax'* ~ arr’ (a) = aro + (rt -t) + (t-t)- (arr(a)-t) 


< (6 + 3e) + (48 + 12e + 4p(35 + 10e)) + (8 + 3e)-(8- e), by Lemmas 5-2 and 
5-5 and the lower bound on the message delay 


= 55 + 19¢ + 4p(38 + 10e). 8 


The next lemma bounds the error in a nonfaulty process’ estimate of another nonfaulty process’ 


local time at a particular real time. 
Lemma 5-8: Let p and r be nonfaulty. Then 


[DIFF (r) + Cl (tmax'* +) - Cl (tmax'* *) < e+ p(118 + 38e). 
Proof: |DIFF' a(t) + Cl (tmax'*")-Cl, (tmax'** 


= VAL, (t) + 8-ARR! alt) + c p(tmax'*") Cl (tmax'* "y. 
if the quantity in the absolute vaiue signs is negative, then this xpresnion is equa 
Ci (tmax'* ")-Ci (tmax'*") + Ci (arr! (9)-8-Vat! A) 


$C (tmax’**) - Ci (tmax'*') + Ci (arr! alt -8- Ci lar (r)-8~e), since the delay is at 
most 8 +e 


i aaa 

< Ci (tmax'*")-c! (imax!) +C oars (rn) -8-c! (arr, (r)) + (1 + p){S + 2), since 
the clock drift is at most 1 +p 

= (Ci (tmax'*")-C! (tmax'*")) - (Ci arr’ (n) -C! (arr! (n))-8 + 8 +e + pd + pe 

< 2p(tmax'** - arr! (9) +e + pd + pe, by Lemma 42 

<2p(56 + 192) + e + pd + pe, by Lemma 5-7 

=e + p(118 + 38e). 


If the quantity in the absolute value signs is positive, a similar argument shows that 
[DIFF (r) +! ge Oe Ci (tmax'* "| < e + p(118 + 37e). 8 


The next lemma bounds how far apart two processes’ i-th ‘dou are at the time when the last 
process begins round i + 1. The bound SER ete ee ene ee 


process begins round |. 
Lemma 5-9: For any nonfaulty p and q, and any i, 


Ic’, (tmax!* 1) - Cy (tmax'* "| < BI + Sp(8 + Se). 
eroak Ic fund) = Ci, (tmax'* *) 


SIC} (tmax') - C! (tmax'| + (C!(tmax'*")- Cl (tmax'* ')) - (Ci (tmax’) - ¢ gitmax')| 
<8! + 2p(tmax! *1— tmax’), by definition of B' and Lemma 4-2 
<B! + 2p(48 + 12e), by Lemma 5-6 and ignoring p” terms 
= Bl + 8p(8 + 3e). 8 | 
Now we can state the main result, bounding B'*1 in terms of Bl. 


Theorem 5-10: B'*' < 4B! + 2e + 29(118 + 308). 
Proof: B'*’ = max{jc'*" pctmax'* *) - Go! tax") for nontaulyp and 


Letx = « + p(118 + We). 


We now define three multisets U, V, and W that satisfy the hypotheses of Lemma A-4. 
Let 


U = DIFF, + Cl (tmax!*"), 
V = DIFF! + C, (tmax'*"), and 
W = {C!(tmax'*): ris nontaulty). 


U and V have size n; W has size n-f. 


aa ne 
Define an injection from W to U as follows. Map each element C' in W to DIFF! at) + 
Ci ,(tmax'* *) in U. Since Lemma 5-8 implies that 

[DIFF (r) + Ci (tmax'* )-Cl (tmax!* "I < x 

for all the n - f nonfaulty processes, d (W,U) = 0. Similarly, d (W,V) = 0. 

By Lemma 5-9, diam(W) < B! + 8p(8 + 3e). Thus, Lemma A-4 implies 
|mid(reduce(U)) - mid(reduce(V))}| < ‘ediam(W) + 2x 

= 4B! + 2e + 2p(118 + 39e). 

Since mid(reduce(U)) = mid{reduce(DIFF), + C (tmax'* ')) 

= mid(reduce(DIFF'.)) + Ci (tmax'**) 

= ADJ!) + Cl (tmax'*’ 

ss cl*" (tmax'* 1) 

and similarly mid(reduce(V)) = C'** (tmax'*’), the result follows. 


We obtain an approximate bound on how closely this algorithm will synchronize the clocks by 


considering the limit of B! as the round number increases without bound. 
Theorem 5-11: This algorithm can synchronize clocks to within 4e + 49(115 + 39e). 


Proof: lim_, 998 


= lim,_,qofB°/2! + (1 + 1/2 +... + 1/2420 + 2p(118 + 38e))] 


= 4e + 4p(118 + 282), since the limit of the geometric series is 2. 8 


As was the case for Algorithm 4-1, if the number of processes, n, increases while f, the number of 
faulty processes remained fixed, a greater closeness of synchronization can be achieved by 
modifying Algorithm 5-1 so that it computes the mean instead of the midpoint of the range of _ 
values. which approaches 22 + 2pP as n approaches infinity. 


After modifying Algorithm 5-1, we get 
B < Bt/(n-2f) + 2e + 2p(118 + Be). 
This is the same as. 


BI < BH/(n-2f) + (1 - (F/(n-2A))/(1 - f/(n-2A))(2e + 2p(118 + 30), 


i 
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which approaches 2e + 2p(11d + 39e) as n approaches infinity. 


5.4 Determining the Number of Rounds 


The nonfaulty processes must determine how many rounds of this algorithm must be run to 
establish the desired degree of synchronization before switching to the maintenance algorithm. 
The basic idea is for each nonfaulty process p to estimate 8°, and then calculate a sufficient 
number of rounds, NROUNDS, , using the known rate of convergence. B’ is estimated by having 
p calculate an overestimate and an underestimate for CO (tmax®) for each q, and letting the 
estimated B° be the difference between the maximum overestimate and the minimum 
underestimate. 


Let p’s overestimate for CF, (tmax’) be OVER.(q) and p's underestimate for CO (tmax”) be 
UNDER ,(q). 


For the overestimate, we assume that q’s clock is fast, and that the maximum amount of time 
elapses between t”, (when q sent the message) and tmax°, That maximum is § + ¢ since every 
nonfaulty process begins round 0 as soon as it receives a message. Thus, 


OVER, (q) = VAL? (q) + (1+ pyS + e). 


Similarly, we can derive the underestimate. We assume that q is the last nonfaulty process to 
begin round 0. Thus, 


UNDER (q) = VAL (a). 
Process p computes its estimate of B°, 
B, = max, {OVER, (q)}- min (UNDER, (a)}. 


Now p estimates how many rounds are needed until the spread is close enough. There is a 
predetermined y > 4e + 4p(118 + 38e), which is the desired closeness of synchronization for 
nese He soe After j rounds, 


B< a JA 4 (141/24 ..4 Maes + sali + 30e)). 


Process p sts he right hand se equal to 7 and ses fort obtain it eatimae ofthe required 
number of rounds, NROUNDS.. : 


Se ee nr ae eae Lint aS. 


Now each process executes a Byzantine Agreement protocol on the vector of NROUNDS values, 
one value for each process. The processes are guaranteed to have the same vector at the end of 
the Byzantine Agreement protocol. Each process chooses the (f+ 1)-st smallest element of the 
resulting vector as the required number of rounds. The smallest number of rounds computed by a 
nonfaulty process will suffice to achieve the desired closeness of synchronization. Variations in 
the number of rounds computed by different nonfaulty processes are due to spurious values 
introduced by faulty processes and to different message delays. However, the range computed 
by any nonfaulty process is guaranteed to include the actual values of all nonfaulty processes at 
tmax®, so the range determined by the process that computes the smallest number of rounds also 
includes all the actual values. In order to guarantee that each process chooses a number of 
rounds that is at least as large as the smallest one computed by a nonfaulty process, it chooses 
the (f + 1)-st smallest element of the vector of values. 


Any Byzantine Agreement protocol requires at least f + 1 rounds. The processes can execute 
this algorithm in parallel with the clock synchronization algorithm, beginning at round 0. The 
clock synchronization algorithm imposes a round structure on the processes’ communications. 
The Byzantine Agreement algorithm can be executed using this round structure. Each BA 
message can also include information needed for the clock synchronization algorithm (namely, 
the current clock value). However, the processes will always need to do at least f + 2 rounds, one 
to obtain the estimated number of rounds and f + 1 for the Byzantine Agreement algorithm. 


5.5 Switching to the Maintenance Algorithm 


After the processes have done the required number of rounds (denoted by r throughout this 
section) of the start-up algorithm, they cease executing it. The processes should begin the 
maintenance algorithm as soon as possible -atter ending the start-up algorithm in order to 
minimize the inaccuracy introduced by the clock drift. 


In the maintenance algorithm each process broadcasts its clock value when its clock reaches T’, 
fori = 0, 1, ..., where T'*' = Ti + P. Let T° be a multiple of P. it is shown below in Lemma 5-13 
that the first multiple of P reached by nonfauity p’s clock after finishing the required r rounds 
differs by at moet one from the first muitiple reached by nonfaulty q’s clock after the r rounds. 
When a process reaches the first multiple of P after it has ended the start-up algorithm, it 
broadcasts its clock value as in the maintenance aigorithm, but doesn’t update its clock. At the 
next multiple of P, the process begins the full maintenance algorithm by broadcasting its clock 
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value and updating its clock. (It will receive clock values from ail nonfaulty processes.) 
The analysis introduces a new quantity, 8 , Fepresenting an upper bound on the closeness of the 
nonfaulty processes’ clocks at tmax’. That is, for any nonfaulty processes p and q, IC‘ (tmax’) - 
Ci (tmax’)| < B,. We show that if the following five inequalities are satisfied by the parameters, 


then the switch from the start-up algorithm to the maintenance algorithm (with parameter 8) can 
be accomplished. 


(1) B, >4e + 4p(118 + 39e) 
(2)B >(B, + 2e + p(6P-f, + 28 + 12e)) /(1-8p) 
(3)P >2(1 + p)(B + e) + (1 + p)max{d, B + e} + pd 
(4)P <B/4p-e/p- p(B + & + 2)-28-8-2e 
(5) B > 4e + 4p(3B + & + Se) + 8p2(B + 8 + e) 


The first inequality is imposed by the limitation on how closely the start-up algorithm can 
synchronize. The second inequality reflects the inaccuracy introduced during the switch. The 
last three are simply repeated from Section 4.5.1. 


First we show that B, can be attained by the start-up algorithm. 


Lemma 5-12: There exists an integer i such that Bi SB. 

Proof: Since 8, must be larger than 4e + 4p(118 + 39e), the result follows from — 
Theorem 5-11, which states that the closeness of synchronization approaches 4e + 
4p(116 + 392) as the round number, i, increases. B 


Note that the number of rounds, r, that the processes agree on is > i, and that the worst-case B’ is 
no more than the worst-case B', which is at most f.,. 


Lemma 5-13 shows that the first multiple of P reached by a nonfaulty process after finishing the 
start-up algorithm differs by at most one from that reached by another nonfaulty process. 
Lemma 5-13: Let p and q be nonfaulty processes. Then 


Ic alta) me <P. 
Proof: IC - Ct < Ic’) + (1+ ptt, -t)-C et 


SIC tJ CF (1 + (1 + p)(B + 30), by Lemma 5-2 
SUC UE) - C’(tmax’)) ~ (C(t) - Ct (tmax’))| + IC", (tmax') - Ci (tmax’}] 


+ (1+ p)(d + 3e) 
S 2p(tmax' -t' ) + B, + (1 + p)(S + 3e), by Lemma 4-2 and definition of 8, 
S 2p(5 + Se) + B, + (1 + pS + 3e), by Lemma 5-2 
= B, + (1 + Spd + 3e). 


Suppose in contradiction that P< 8, + (1 + 3p)(8 + 3e). By solving inequality (2) for 
B,, we get 


B, < (B - 2e- p(8B + 25 + 12e + GP))/(1->), 

which implies that 

P< (B-2e-p(8B + 25 + 12e + GP))/(1-p) + (1 + SpMd + 3e). 

This simplifies to P< (8 + 8 + e-8p8 + pd-3pe) /(1 + Sp). 

Combining this with inequality (3) yields 

21 + p(B + e) + (1 + p)d + p&<P<(B + 8 + e-B8pf + pd—Spe) /(1 + Sp). | 

Solving for B gives 8 <-(e + 6p5 + 15pe) / (1 + 20p), which is a contradiction. 8 
The rest of the section is devoted to showing that the difference in real times when nonfaulty 
processes’ clocks reach the first multiple of P at which they will all perform the maintenance 


‘algorithm is less than or equal to 8. Consequentty, this 8 can be preserved by the maintenance 
algorithm. 


Define kP to be the first multiple of P reached by any nonfaulty process’ r-th clock. The first 
multiple of P reached by any other nonfaulty process is either KP or (k + 1)P, by Lemma 5-13. At 
(k+1)P some of the nonfaulty processes will actually update their clocks, and at (k +2)P all of 
them will update their clocks. | 


Recall that (k+1)P = T**’ and UX*! = T+! + (1 + p(B + 8 + @). Letut*! = co" (u**") and 
similarly for q. 
‘Let s and t be two nonfaulty processes. Here is a description of the worst case: 
es has the smallest clock value at tmax’, barely above (k~1)P, and its clock is slow. 
o's clock is fast and is B, ahead of s's at tax’. 
| « s updates its clock at u**?, by decrementing tas much as possible. 


ee ee ee 


e t updates its clock at U"*', by incrementing it as much as possible. 


First we must bound how far apart in real time nonfaulty processes’ r-th clocks reach U"* '. 
Lemma 5-14: Let p and q be nonfauity processes. Then 


lor (UT**) ch (UN?) < (1 - p)B, + 2p(2P + B+ 8 +e). 

Proof: Without loss of generality, suppose c' (U"*") > ci (U**"). Then 
k+1 k+1 k+1 +1 

[et (US**)- ch (UK* y= oF ure t)—oF (ut) 

= (c', (U**!)-tmax’)-(c", (U** 4) ~ tmax’) 


<(C;, (u**1 a) Cf (tmax'))(1 + p)-(C', (uX*" q) ~ C’(tmax’))(1 - p), by the bounds on 
the ‘drift rate 
<(2P + (1 + pylB + 8 + e))(1 + p)—(2P + (1.4 p(B + 6 + e)-B,)( 1-9) 


= (1-p)B, + 2p(2P + B+ 5+ e). E 


Next, we bound the additional spread introduced by the resetting of the clocks. 
Lemma 5-15: Let s and t be the nonfauity processes described above. Then 


(a) of * (UK*) ch (UF*") < (1 + pile + pl4B + & + 5e), and 


(bo) oF (UK* 1) - 0 * Tut") < (1 + pile + p48 + 8 + Se). 

Proof: (a) By Lemma 4-15, we know that 3's new clock is at most a = e + sas +8+ 
Se) less than the “smallest” of the previous nontaulty clocks at c'(U“*') = u"*? 
Since s had the smallest clock before, C'*' (ut*!) > Cl (ut*1) a. By the lower 
bound on the drift rate, 


ct ‘ut 1) = et (u** i) < (1 - ae: 


(b) Lemma 4-15 also states that t's new clock is at most a more than the "largest" of 
the previous nonfaulty clocks at u'*",, which was t’s clock. The argument is similar to 
(a). 8 


Finally, we can bound the maximum difference in real time between two nonfaulty processes’ 
clocks reaching T**?. Let i, be the index of p’s logical clock that is in effect when T**? is 
reached. 

Theorem 5-16: Let p and q be nonfaulty processes andi = i, andj = ly Then 

ict) cl crt <p. 

Proof: Without loss of generality, suppose c! (T**") > cl (T**4), Then 


ll) =e = ere) rt 


<clt 1 crea) ~ of (T+?) 
for nonfaulty processes s and t that behave as described above. 
We know from Lemma 4-2 that 
(oft 1 (ret) ft rt (cht uke) — ol TUK) 
S2p(P-(1 + p(B + 5 + @)). 
Thus c!* ares _¢ft aed 
<2p(P-(1 + p(B + 8 + e)) + ch CUNT") -ch* Use) 
= 2p(P-(1 + p(B + 8 + e)) + c!**(UK* 1) ch (Ut) + ch (Ut ch? US!) 
+ of (UK) ch (Ukr) 
S 2p(P-(1 + p(B + 5 + e)) + A1 + pyle + pl4s + 8 + 5e)) 
+ of (UK*")— cf (U**"), by Lemma 5-15 
S 2p(P-(1 + p(B + & + e)) + 2(1 + pile + p(4p A & + 5e)) 
+ (1-p)B, + 2p(2P +B + 8+ e),bylLemmas14 
< B, by inequality (2). \ 


This B is approximately 6e, which is slightly larger than the smallest one maintainable, 4e. To 
shrink it back down, P can be made slightly smaller than required by the maintenance algorithm, 
jas long as the lower bound of inequality (3) isn’t violated. Since the synchronization procedure is 
performed more often, the clocks don't drift apart as much, and consequently, they can be more 
closely synchronized. Once the desired f is reached, P can be increased again. (The 
computational costs associated with performing the synchronization procedure and the possible 
degradation of validity may make it advisable to resychronize more infrequently.) 


5.6 Using Only the Start-up Algorithm | 


A natural idea is to use Algorithm 5-1 solely, and never switch to the mainenance algorithm, Both 
algorithms can synchronize clocks to within approximately 4e, so such a policy would sacrifice 
very little in accuracy. Using just the one algorithm is conceptually simpler and avoids 
introducing the additional error during the switch-over. However, if the system does no work 
during-the period of time when processes have clocks with different indices, it is important to 
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minimize this interval. Algorithm 5-1 has such an interval of length 6 + 3e; for Algorithm 4-1, it is 
approximately B + 2p(8 + & + e). Depending on the choice of values for the parameters, 


Algorithm 4-1 may be superior in this regard. 


=o 
Chapter Six 


Conclusion 


6.1 Summary 


In conclusion, we have presented a precise formal model to describe a system of distributed 
processes, each of which has its own clack. Within this model we proved a lower bound on how 
closely clocks can be synchronized even under strong simplifying assumptions. 


The major part of the thesis was the description and analysis of an algorithm to synchronize the 
clocks of a completely connected network in the presence of clock drift, uncertainty in the 
message delivery time, and Byzantine process faults. Since it does not use digital signatures, the 
algorithm requires that more than two thirds of the processes be nonfauity. Our algorithm is an 
improvement aver those in [7] based on Byzantine Agreement protocols in that the number of 
messages per round is n? instead of exponential, and that the size of the adjustment made at each 
round is a small amount independent of the number of fauits. 


7 
The algorithm in [5] works for a more genera! communication network, and, since it uses digital 
signatures, only requires that more than half the processes be nonfaulty. However, the size of the 
adjustment depends on the number of faulty processes. 


The issue of which algorithm synchronizes the the most closely is difficult to resolve because of 
differing assumptions about the underlying model. For instance, Algorithm 4-1 of this thesis can 
achieve a closeness of synchronization of approximately 4e in our notation. However, we assume 
that local processing time is negligible; otherwise Lamport [8] claims that actually there is an 
implicit factor of n in the e, in which case the closeness of synchronization achieved by our 
algorithm depends on the number of processes as do those in [7]. 


We also modified Algorithm 4-1 to produce an algorithm to establish synchronization initially 
among clocks with arbitrary values. This algorithm also handles clock drift, uncertainty in the 
message delivery time, and Byzantine process faults. This problem, as far as we know, had not 
been addressed previously for real-time clocks. 


—- 


6.2 Open Questions 


It would be interesting to know more lower bounds on the closeness of synchronization 
achievable. For example, a question posed by J. Halpern is to determine a lower bound when the’ 
communication network has an arbitrary configuration and the uncertainty in the message 
delivery time is different for each link. 


There are also no known lower bounds for the case of clock drift and faulty processes. 


The validity of algorithm 5-1 has not been computed. If this algorithm were used solely, knowing 
how the processes’ clocks increase in relation to real time would be of interest. Lower bounds in 
general for the validity conditions are not known. 


It seems reasonabie that there is a tradeoff between the closeness of synchronization and the 
validity, since the synchronization. procedure must be performed more often in order to 
synchronize more closely, but each resychronization event potentially worsens the validity. This | 
tradeoff has not been quantified. 


M. Fischer [4] has suggested an “asynchronous” version of Algorithm 5-1 to establish 
synchronization. In his version, a nonfaulty process wakes up at an arbitrary time with arbitrary 
values for its correction variable and array of differences. Every P as measured on its physical _ 
(not logical) clock, the process performs the fault-tolerant averaging function and updates its 
clock. It seems that the clock values should converge, but at what rate? 


‘What kind of algorithms that use the fault-tolerant averaging function can be used in more general 
' Communication graphs? 


Another avenue of investigation is using the fault-tolerant averaging function together with the 
capability for authentication to see if algorithms with higher fault-tolerance than those of this 
thesis and better accuracy than those in [5] can be designed. 
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Appendix A 


Multisets 


This Appendix consists of definitions and lemmas concerning multisets. needed for the proofs of 
Lemmas 4-9 and 5-10. These definitions and lemmas are analogous to some in [1]. 


A multiset U is a finite collection of real numbers in which the same number may appear more 
than once. The largest value in U is denoted max(U), and the smallest value in U is denoted 
min(U). The diameter of U, diam(U), is max(U) - min(U). Let s(U) be the multiset obtained by 
deleting one occurrence of min(U), and /(U) be the multiset obtained by deleting one occurrence 
of max(U). If [U] > 2f + 1, we define reduce(U) to be |'s'(U), the result of removing the f largest | 
and f smailest elements of U. 


Given two multisets U and V with |U| < |V|, consider an injection c mapping U to V. For any 
nonnegative real number x, define S,(c) to be {u€U: ju ~ c(u)| > x}. We define the x-distance 
between U and V to be d,(U,V) = min.{|S,(c)[}. We say c witnesses d_(U,V) if |S.(c)| = d.(U,V). 
The x-distance between U and V is the number of elements of U that cannot be matched up with 
an element of V which is the same to within x. If |u-c(u)] < x, then we any u and c(u) are x-paired | 
by c. The midpoint of U, mid(U), is i Sigal + a 


For any multiset U and real number r, ‘define U + rto be the multiset obtained. by adding r to every 
element of U; that is, U + r = {u + r: u € U}. It is obvious that mid and reduce are invariant 


under this operation. 


The next lemma bounds the diameter of a reduced multiset. 
Lemma A-1: Let U and W be multisets such that |U| = n, [Wi = n—-f, and d (W,U) = 
0, where n > 2f + 1. Then 


max(reduce(U)) < max(W) + x and min(reduce(U)) > min(W)-x. 

Proof: We show the result for max; a similar argument holds for min. Let c witness 
d (WU). Suppose none of the f elements deleted from the high end of U are x-paired 
with elements of W by c. Since d(W,U) = 0, the remaining n - f elements of U are 
x-paired with elements of W by c, and thus every element of reduce(U) is x-paired with 
an element of W. Suppose max(reduce(U)) is x-paired with w in W by c. Then 
max(reduce(U)) <w + x < max(W) + x. 


Now suppose one of the elements deleted from the high end of U is x-paired with an 


aE. re 
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element of W by c. - Let u be the largest such, and suppose it was paired with w in 
W. Then max(reduce(U)) < u < w + x < max(W) + x. fl 


We show that the x-distance between two multisets is not increased by removing the largest (or 


smallest) element from each. 
Lemma A-2: Let U and V be multisets, each with at least one element. Then 
d (KU),1(V)) < d, (U,V) and d,(s(U),s(V)) < d)(U,V). 
Proof: We give the proof in detail for |; a symmetric argument holds for s. Let M = K(U) | 
and N = |(V). Let c witness d Us, V). We construct an injection c’ from M to N and 
show that |S, (cl < IS, (c}l. Since d.(M,N) < |S,(c’ }j and is, (c)| = d,(U,V), it follows 
that d (MN) Sd (U,V). 


Suppose u = max(U) and v = max(V). (These are the deleted elements.) 


Case 1: c(u) = v. Define c’(m) = c(m) for all m in M. Obviously c’ is an injection. 
IS, (cl SIS, (¢)] since either S(c’) = S,(c) or S(c’) = S,(c) - {u}. 


Case 2: c(u) # v and there is no u’ in U such that c{u’) = v. This is the same as Case . 
Te 2 : , 


Case 3: c(u) # v, and there is u’ in U such that c{u’) = v. Suppose c(u) = v’. Define 
c’(u’) = v’ and c’(m) = c(m) for ail m in M besides v’. Obviously c’ is an injection. Now 
we show that |S (c’)| < [S,(c)}. 


If u or u’ or both are in S (c) then whether or not u’ is in S .(C’) the inequality holds. The 
only trouble arises if u and u’ are both not in S (c) but u’ is in S (c’). Suppose that is 
the case. (Then Ju’ -c'(u’)| = ju’-v']>x. There are two possibilities: 


(i) u’ Vv +X Since u is not in S (c), |u-c(u)| = Ju-v'] <x. Sov > u-x. Hence u’> 
vi+xX>uU-xX +X, which implies that u’ >u. But this contradicts u being the largest 
element of U. . . 


(li) v>u' + x. Since u’ is not in S (c), ju’ -c(u’)| = ju’-v)i <x. Sou’ > v-x. Hence 
v>u’ + x >v-xX + x, which implies that v' > v. But this contradicts v being the 
largest element of V. 


The next lemma shows that the results of reducing two multisets, each of whose x-distance from a 


third multiset is 0, can’t contain values that are too far apart. 
Lemma A-3: Let U, V, and W be muttisets such that |U} = [Vj = nee =n-f, 
where n> 3f. Ifd (W,U) = Oandd (W,V) = 0, then 


min(reduce(U)) - - max(reduce(V) Sm. 
Proof: First we show that d, (U,V) < f. Let c, witness d(W,U) and c, witness 
d.(W,V). Define an injection c from U to V as follows: if there ig w in W such that (w) 
= u, then let cu) = ¢,(w); otherwise, let c{u) be any unused element of V. For of 
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the n - f elements w in W, there is u in U such that u = c, (w). Thus |u-c(u)| < |u- wi 


+ jw-c(u)| = le, ,(w) - wi + Iw - c(w)| <x +x = 2x. ThusS,(c) <f, sod, (U,V) < 
f. 


Then by applying Lemma A-2 f times, we know that d,, (reduce(U),reduce(V)) < f. 
Since |reduce(U)| = jreduce(V)| = n — 2f > f, there are u in reduce(U) and v in 
reduce(V) such that ju —v| < 2x. Thus min(reduce(U)) - max(reduce(V)) < u-v < 2x. 
' ; 


Lemma A-4 is the main multiset result. It bounds the difference between the midpoints of two 


reduced multisets in terms of a particular third multiset. 
Lemma A-4: Let U, V, and W be multisets such that |U] = [V| = n and |W| = n-f, 
where n > 3f. If d, (W,U) = Oand d (WY) = 0, then 


|mid(reduce(U)) - mid(reduce(V))| < ‘2diam(W) + 2x. 
Proof: |mid(reduce(U)) - mid(reduce(V))| 


= %|max(reduce(U)) + min(reduce(U)) — max(reduce(V)) -— min(reduce(V))| 
= 'lmax(reduce(U)) - min(reduce(V)) + min(reduce(U)) - max(reduce(V))| 


If the quantity inside the absolute value signs is nonnegative, this expression is equal 
to ; 


Ys[max(reduce(U)) - min(reduce(V)) + min(reduce(U)) - max(reduce(V))] 


< '4(max(W) + x—(min(W) - x) + min(reduce(U)) - max(reduce(V))), by applying 
Lemma A-1 twice 


= 4(diam(W) + 2x + min(reduce(U)) - max(reduce(V))) 
< '%4(diam(W) + 2x + 2x), by Lemma A-3 
= ‘diam(W) + 2x. 


If the quantity inside the absolute value is nonpositive, then symmetric reasoning gives 
the result. 8 
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